Hi,

I don't think anybody has been testing importing of Impala tables
directly. Is there any chance to export these first, say as
unpartitioned Hive tables and import these? Just an idea..

Andre

On 07/21/2014 11:46 PM, chutium wrote:
> no, something like this
> 
> 14/07/20 00:19:29 ERROR cluster.YarnClientClusterScheduler: Lost executor 2
> on xxxxxxxx02.xxx: remote Akka client disassociated
> 
> ...
> ...
> 
> 14/07/20 00:21:13 WARN scheduler.TaskSetManager: Lost TID 832 (task 1.2:186)
> 14/07/20 00:21:13 WARN scheduler.TaskSetManager: Loss was due to
> java.io.IOException
> java.io.IOException: Filesystem closed
>         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:779)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
>         at java.io.DataInputStream.readFully(DataInputStream.java:195)
>         at java.io.DataInputStream.readFully(DataInputStream.java:169)
>         at
> parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599)
>         at
> parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360)
>         at
> parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100)
>         at
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
>         at
> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
>         at
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122)
>         at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>         at org.apache.spark.scheduler.Task.run(Task.scala:51)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> 
> 
> ulimit is increased
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SQL-on-160-G-parquet-file-snappy-compressed-made-by-cloudera-impala-23-core-and-60G-mem-d-tp10254p10344.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 

Reply via email to