Hi, I don't think anybody has been testing importing of Impala tables directly. Is there any chance to export these first, say as unpartitioned Hive tables and import these? Just an idea..
Andre On 07/21/2014 11:46 PM, chutium wrote: > no, something like this > > 14/07/20 00:19:29 ERROR cluster.YarnClientClusterScheduler: Lost executor 2 > on xxxxxxxx02.xxx: remote Akka client disassociated > > ... > ... > > 14/07/20 00:21:13 WARN scheduler.TaskSetManager: Lost TID 832 (task 1.2:186) > 14/07/20 00:21:13 WARN scheduler.TaskSetManager: Loss was due to > java.io.IOException > java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:779) > at > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at > parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599) > at > parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360) > at > parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100) > at > parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172) > at > parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > > > ulimit is increased > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SQL-on-160-G-parquet-file-snappy-compressed-made-by-cloudera-impala-23-core-and-60G-mem-d-tp10254p10344.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >