Re: Spark 1.0.1 SQL on 160 G parquet file (snappy compressed, made by cloudera impala), 23 core and 60G mem / node, yarn-client mode, always failed

Aaron Davidson Mon, 21 Jul 2014 12:59:43 -0700

What's the exception you're seeing? Is it an OOM?


On Mon, Jul 21, 2014 at 11:20 AM, chutium <teng....@gmail.com> wrote:

> Hi,
>
> unfortunately it is not so straightforward
>
> xxx_parquet.db
>
> is a folder of managed database created by hive/impala, so, every sub
> element in it is a table in hive/impala, they are folders in HDFS, and each
> table has different schema, and in its folder there are one or more parquet
> files.
>
> that means
>
> xxxxxx001_suffix
> xxxxxx002_suffix
>
> are folders, there are some parquet files like
>
> xxxxxx001_suffix/parquet_file1_with_schema1
>
> xxxxxx002_suffix/parquet_file1_with_schema2
> xxxxxx002_suffix/parquet_file2_with_schema2
>
> it seems only union can do this job~
>
> Nonetheless, thank you very much, maybe the only reason is that spark
> eating
> up too much memory...
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SQL-on-160-G-parquet-file-snappy-compressed-made-by-cloudera-impala-23-core-and-60G-mem-d-tp10254p10335.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Spark 1.0.1 SQL on 160 G parquet file (snappy compressed, made by cloudera impala), 23 core and 60G mem / node, yarn-client mode, always failed

Reply via email to