I think you are fetching too many results to the driver. Typically, it is not recommended to collect much data to driver. But if you have to, you can increase the driver memory, when submitting jobs.
Thanks. Zhan Zhang On Dec 11, 2015, at 6:14 AM, Tom Seddon <mr.tom.sed...@gmail.com<mailto:mr.tom.sed...@gmail.com>> wrote: I have a job that is running into intermittent errors with [SparkDriver] java.lang.OutOfMemoryError: Java heap space. Before I was getting this error I was getting errors saying the result size exceed the spark.driver.maxResultSize. This does not make any sense to me, as there are no actions in my job that send data to the driver - just a pull of data from S3, a map and reduceByKey and then conversion to dataframe and saveAsTable action that puts the results back on S3. I've found a few references to reduceByKey and spark.driver.maxResultSize having some importance, but cannot fathom how this setting could be related. Would greatly appreciated any advice. Thanks in advance, Tom