Re: What is the relationship between reduceByKey and spark.driver.maxResultSize?

Zhan Zhang Fri, 11 Dec 2015 11:18:25 -0800

I think you are fetching too many results to the driver. Typically, it is not 
recommended to collect much data to driver. But if you have to, you can 
increase the driver memory, when submitting jobs.


Thanks.

Zhan Zhang

On Dec 11, 2015, at 6:14 AM, Tom Seddon 
<mr.tom.sed...@gmail.com<mailto:mr.tom.sed...@gmail.com>> wrote:

I have a job that is running into intermittent errors with  [SparkDriver] 
java.lang.OutOfMemoryError: Java heap space.  Before I was getting this error I 
was getting errors saying the result size exceed the 
spark.driver.maxResultSize.  This does not make any sense to me, as there are 
no actions in my job that send data to the driver - just a pull of data from 
S3, a map and reduceByKey and then conversion to dataframe and saveAsTable 
action that puts the results back on S3.

I've found a few references to reduceByKey and spark.driver.maxResultSize 
having some importance, but cannot fathom how this setting could be related.

Would greatly appreciated any advice.

Thanks in advance,

Tom

Re: What is the relationship between reduceByKey and spark.driver.maxResultSize?

Reply via email to