I haven't had a chance to look at the details of this issue, but we have seen Spark successfully read Parquet tables created by Impala.
On Tue, Jul 22, 2014 at 10:10 AM, Andre Schumacher <andre.sc...@gmail.com> wrote: > > Hi, > > I don't think anybody has been testing importing of Impala tables > directly. Is there any chance to export these first, say as > unpartitioned Hive tables and import these? Just an idea.. > > Andre > > On 07/21/2014 11:46 PM, chutium wrote: > > no, something like this > > > > 14/07/20 00:19:29 ERROR cluster.YarnClientClusterScheduler: Lost > executor 2 > > on xxxxxxxx02.xxx: remote Akka client disassociated > > > > ... > > ... > > > > 14/07/20 00:21:13 WARN scheduler.TaskSetManager: Lost TID 832 (task > 1.2:186) > > 14/07/20 00:21:13 WARN scheduler.TaskSetManager: Loss was due to > > java.io.IOException > > java.io.IOException: Filesystem closed > > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703) > > at > > > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:779) > > at > > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840) > > at java.io.DataInputStream.readFully(DataInputStream.java:195) > > at java.io.DataInputStream.readFully(DataInputStream.java:169) > > at > > > parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599) > > at > > > parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360) > > at > > > parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100) > > at > > > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172) > > at > > > parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130) > > at > > org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) > > at > > > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) > > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > at > > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) > > at > > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > > at org.apache.spark.scheduler.Task.run(Task.scala:51) > > at > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > > > > > > ulimit is increased > > > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SQL-on-160-G-parquet-file-snappy-compressed-made-by-cloudera-impala-23-core-and-60G-mem-d-tp10254p10344.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > >