Liu Dinghua created SPARK-22260: ----------------------------------- Summary: java.lang.RuntimeException: hdfs://HdfsHA/logrep/1/sspstatistic/_metadata is not a Parquet file (too small) Key: SPARK-22260 URL: https://issues.apache.org/jira/browse/SPARK-22260 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.1 Environment: spark1.6.1,YARN2.6.0 Reporter: Liu Dinghua
the codes which encountered errors are as follow: * sqlContext.read.parquet("/logrep/1/sspstatistic")* the detail errors are as follow: 17/10/12 10:41:04 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.0 (TID 3, slave05): java.io.IOException: Could not read footer: java.lang.RuntimeException: hdfs://HdfsHA/logrep/1/sspstatistic/_metadata is not a Parquet file (too small) at org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:247) at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$27.apply(ParquetRelation.scala:786) at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$27.apply(ParquetRelation.scala:775) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: hdfs://HdfsHA/logrep/1/sspstatistic/_metadata is not a Parquet file (too small) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:412) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more 17/10/12 10:41:04 INFO scheduler.TaskSetManager: Starting task 1.1 in stage 1.0 (TID 4, slave05, partition 1,PROCESS_LOCAL, 968186 bytes) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org