Re: OutOfMemoryError when using DataFrame created by Spark SQL

Michael Armbrust Wed, 25 Mar 2015 11:10:56 -0700

You should also try increasing the perm gen size: -XX:MaxPermSize=512m

On Wed, Mar 25, 2015 at 2:37 AM, Ted Yu <yuzhih...@gmail.com> wrote:


> Can you try giving Spark driver more heap ?
>
> Cheers
>
>
>
> On Mar 25, 2015, at 2:14 AM, Todd Leo <sliznmail...@gmail.com> wrote:
>
> Hi,
>
> I am using *Spark SQL* to query on my *Hive cluster*, following Spark SQL
> and DataFrame Guide
> <https://spark.apache.org/docs/latest/sql-programming-guide.html> step by
> step. However, my HiveQL via sqlContext.sql() fails and
> java.lang.OutOfMemoryError was raised. The expected result of such query is
> considered to be small (by adding limit 1000 clause). My code is shown
> below:
>
> scala> import sqlContext.implicits._
> scala> val df = sqlContext.sql("""select * from some_table where 
> logdate="2015-03-24" limit 1000""")
>
> and the error msg:
>
> [ERROR] [03/25/2015 16:08:22.379] [sparkDriver-scheduler-27] 
> [ActorSystem(sparkDriver)] Uncaught fatal error from thread 
> [sparkDriver-scheduler-27] shutting down ActorSystem [sparkDriver]
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> the master heap memory is set by -Xms512m -Xmx512m, while workers set by 
> -Xms4096M
> -Xmx4096M, which I presume sufficient for this trivial query.
>
> Additionally, after restarted the spark-shell and re-run the limit 5 query
> , the df object is returned and can be printed by df.show(), but other
> APIs fails on OutOfMemoryError, namely, df.count(),
> df.select("some_field").show() and so forth.
>
> I understand that the RDD can be collected to master hence further
> transmutations can be applied, as DataFrame has “richer optimizations under
> the hood” and the convention from an R/julia user, I really hope this error
> is able to be tackled, and DataFrame is robust enough to depend.
>
> Thanks in advance!
>
> REGARDS,
> Todd
> 
>
> ------------------------------
> View this message in context: OutOfMemoryError when using DataFrame
> created by Spark SQL
> <http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-when-using-DataFrame-created-by-Spark-SQL-tp22219.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>
>

Re: OutOfMemoryError when using DataFrame created by Spark SQL

Reply via email to