What's the exception you're seeing? Is it an OOM?
On Mon, Jul 21, 2014 at 11:20 AM, chutium <teng....@gmail.com> wrote: > Hi, > > unfortunately it is not so straightforward > > xxx_parquet.db > > is a folder of managed database created by hive/impala, so, every sub > element in it is a table in hive/impala, they are folders in HDFS, and each > table has different schema, and in its folder there are one or more parquet > files. > > that means > > xxxxxx001_suffix > xxxxxx002_suffix > > are folders, there are some parquet files like > > xxxxxx001_suffix/parquet_file1_with_schema1 > > xxxxxx002_suffix/parquet_file1_with_schema2 > xxxxxx002_suffix/parquet_file2_with_schema2 > > it seems only union can do this job~ > > Nonetheless, thank you very much, maybe the only reason is that spark > eating > up too much memory... > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SQL-on-160-G-parquet-file-snappy-compressed-made-by-cloudera-impala-23-core-and-60G-mem-d-tp10254p10335.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >