Hi,
When we try to call saveAsParquetFile on a schemaRDD we get the following error
:
Py4JJavaError: An error occurred while calling o384.saveAsParquetFile.
: java.lang.NoClassDefFoundError:
org/apache/hadoop/mapreduce/lib/output/DirectFileOutputCommitter
at
org.apache.spark.sql.parquet.InsertIntoParquetTable.execute(ParquetTableOperations.scala:240)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
at
org.apache.spark.sql.SchemaRDDLike$class.saveAsParquetFile(SchemaRDDLike.scala:76)
at
org.apache.spark.sql.api.java.JavaSchemaRDD.saveAsParquetFile(JavaSchemaRDD.scala:42)
https://issues.apache.org/jira/browse/SPARK-3595 seems to have addressed this
issue of respecting the OutputCommitter but when I pull from the master and try
the same I still encounter this issue.
I am on a Mapr Distribution and my org\apache\hadoop\mapreduce\lib\output does
not contain DirectFileOutputCommitter
Best Regards,
Santosh