You should also try increasing the perm gen size: -XX:MaxPermSize=512m On Wed, Mar 25, 2015 at 2:37 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> Can you try giving Spark driver more heap ? > > Cheers > > > > On Mar 25, 2015, at 2:14 AM, Todd Leo <sliznmail...@gmail.com> wrote: > > Hi, > > I am using *Spark SQL* to query on my *Hive cluster*, following Spark SQL > and DataFrame Guide > <https://spark.apache.org/docs/latest/sql-programming-guide.html> step by > step. However, my HiveQL via sqlContext.sql() fails and > java.lang.OutOfMemoryError was raised. The expected result of such query is > considered to be small (by adding limit 1000 clause). My code is shown > below: > > scala> import sqlContext.implicits._ > scala> val df = sqlContext.sql("""select * from some_table where > logdate="2015-03-24" limit 1000""") > > and the error msg: > > [ERROR] [03/25/2015 16:08:22.379] [sparkDriver-scheduler-27] > [ActorSystem(sparkDriver)] Uncaught fatal error from thread > [sparkDriver-scheduler-27] shutting down ActorSystem [sparkDriver] > java.lang.OutOfMemoryError: GC overhead limit exceeded > > the master heap memory is set by -Xms512m -Xmx512m, while workers set by > -Xms4096M > -Xmx4096M, which I presume sufficient for this trivial query. > > Additionally, after restarted the spark-shell and re-run the limit 5 query > , the df object is returned and can be printed by df.show(), but other > APIs fails on OutOfMemoryError, namely, df.count(), > df.select("some_field").show() and so forth. > > I understand that the RDD can be collected to master hence further > transmutations can be applied, as DataFrame has “richer optimizations under > the hood” and the convention from an R/julia user, I really hope this error > is able to be tackled, and DataFrame is robust enough to depend. > > Thanks in advance! > > REGARDS, > Todd > > > ------------------------------ > View this message in context: OutOfMemoryError when using DataFrame > created by Spark SQL > <http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-when-using-DataFrame-created-by-Spark-SQL-tp22219.html> > Sent from the Apache Spark User List mailing list archive > <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. > >