Hi, I am using spark-1.3 prebuilt release with hadoop2.4 support and Hadoop 2.4.0.
I wrote a spark application(LoadApp) to generate data in each task and load the data into HDFS as parquet Files (use “saveAsParquet()” in spark sql) When few waves (1 or 2) are used in a job, LoadApp could finish after a few failures and retries. But when more waves (3) are involved in a job, the job would terminate abnormally. All the failures I faced with is: “java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN" and the stacktraces are: java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN at parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:137) at parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:129) at parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:173) at parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:152) at parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:634) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:648) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:648) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) I have no idea what happened since jobs may fail or success without any reason. Thanks. Yijie Shen