Can you try giving Spark driver more heap ?

Cheers



> On Mar 25, 2015, at 2:14 AM, Todd Leo <sliznmail...@gmail.com> wrote:
> 
> Hi,
> 
> I am using Spark SQL to query on my Hive cluster, following Spark SQL and 
> DataFrame Guide step by step. However, my HiveQL via sqlContext.sql() fails 
> and java.lang.OutOfMemoryError was raised. The expected result of such query 
> is considered to be small (by adding limit 1000 clause). My code is shown 
> below:
> 
> scala> import sqlContext.implicits._                                          
>                                                                    
> scala> val df = sqlContext.sql("""select * from some_table where 
> logdate="2015-03-24" limit 1000""")
> and the error msg:
> 
> [ERROR] [03/25/2015 16:08:22.379] [sparkDriver-scheduler-27] 
> [ActorSystem(sparkDriver)] Uncaught fatal error from thread 
> [sparkDriver-scheduler-27] shutting down ActorSystem [sparkDriver]
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> the master heap memory is set by -Xms512m -Xmx512m, while workers set by 
> -Xms4096M -Xmx4096M, which I presume sufficient for this trivial query.
> 
> Additionally, after restarted the spark-shell and re-run the limit 5 query , 
> the df object is returned and can be printed by df.show(), but other APIs 
> fails on OutOfMemoryError, namely, df.count(), df.select("some_field").show() 
> and so forth.
> 
> I understand that the RDD can be collected to master hence further 
> transmutations can be applied, as DataFrame has “richer optimizations under 
> the hood” and the convention from an R/julia user, I really hope this error 
> is able to be tackled, and DataFrame is robust enough to depend.
> 
> Thanks in advance!
> 
> REGARDS,
> Todd
> 
> 
> View this message in context: OutOfMemoryError when using DataFrame created 
> by Spark SQL
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to