Re: Spark 1.0.1 SQL on 160 G parquet file (snappy compressed, made by cloudera impala), 23 core and 60G mem / node, yarn-client mode, always failed

Sandy Ryza Tue, 22 Jul 2014 12:21:07 -0700

I haven't had a chance to look at the details of this issue, but we have
seen Spark successfully read Parquet tables created by Impala.



On Tue, Jul 22, 2014 at 10:10 AM, Andre Schumacher <andre.sc...@gmail.com>
wrote:

>
> Hi,
>
> I don't think anybody has been testing importing of Impala tables
> directly. Is there any chance to export these first, say as
> unpartitioned Hive tables and import these? Just an idea..
>
> Andre
>
> On 07/21/2014 11:46 PM, chutium wrote:
> > no, something like this
> >
> > 14/07/20 00:19:29 ERROR cluster.YarnClientClusterScheduler: Lost
> executor 2
> > on xxxxxxxx02.xxx: remote Akka client disassociated
> >
> > ...
> > ...
> >
> > 14/07/20 00:21:13 WARN scheduler.TaskSetManager: Lost TID 832 (task
> 1.2:186)
> > 14/07/20 00:21:13 WARN scheduler.TaskSetManager: Loss was due to
> > java.io.IOException
> > java.io.IOException: Filesystem closed
> >         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703)
> >         at
> >
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:779)
> >         at
> > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
> >         at java.io.DataInputStream.readFully(DataInputStream.java:195)
> >         at java.io.DataInputStream.readFully(DataInputStream.java:169)
> >         at
> >
> parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599)
> >         at
> >
> parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360)
> >         at
> >
> parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100)
> >         at
> >
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
> >         at
> >
> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
> >         at
> > org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122)
> >         at
> >
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> >         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> >         at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
> >         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> >         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> >         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> >         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> >         at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
> >         at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> >         at org.apache.spark.scheduler.Task.run(Task.scala:51)
> >         at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >         at java.lang.Thread.run(Thread.java:744)
> >
> >
> > ulimit is increased
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SQL-on-160-G-parquet-file-snappy-compressed-made-by-cloudera-impala-23-core-and-60G-mem-d-tp10254p10344.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
>
>

Re: Spark 1.0.1 SQL on 160 G parquet file (snappy compressed, made by cloudera impala), 23 core and 60G mem / node, yarn-client mode, always failed

Reply via email to