RE: OutOfMemory Error

Shao, Saisai Wed, 20 Aug 2014 02:20:08 -0700

Hi Meethu,

The spark.executor.memory is the Java heap size of forked executor process. 
Increasing the spark.executor.memory can actually increase the runtime heap 
size of executor process.

For the details of Spark configurations, you can check: 
http://spark.apache.org/docs/latest/configuration.html

Thanks
Jerry

From: MEETHU MATHEW [mailto:[email protected]]
Sent: Wednesday, August 20, 2014 4:48 PM
To: Akhil Das; Ghousia
Cc: [email protected]
Subject: Re: OutOfMemory Error

 Hi ,

How to increase the heap size?

What is the difference between spark executor memory and heap size?

Thanks & Regards,
Meethu M

On Monday, 18 August 2014 12:35 PM, Akhil Das 
<[email protected]<mailto:[email protected]>> wrote:

I believe spark.shuffle.memoryFraction is the one you are looking for.

spark.shuffle.memoryFraction : Fraction of Java heap to use for aggregation and 
cogroups during shuffles, if spark.shuffle.spill is true. At any given time, 
the collective size of all in-memory maps used for shuffles is bounded by this 
limit, beyond which the contents will begin to spill to disk. If spills are 
often, consider increasing this value at the expense of 
spark.storage.memoryFraction.

You can give it a try.

Thanks
Best Regards

On Mon, Aug 18, 2014 at 12:21 PM, Ghousia 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for the answer Akhil. We are right now getting rid of this issue by 
increasing the number of partitions. And we are persisting RDDs to DISK_ONLY. 
But the issue is with heavy computations within an RDD. It would be better if 
we have the option of spilling the intermediate transformation results to local 
disk (only in case if memory consumption is high)  . Do we have any such option 
available with Spark? If increasing the partitions is the only the way, then 
one might end up with OutOfMemory Errors, when working with certain algorithms 
where intermediate result is huge.

On Mon, Aug 18, 2014 at 12:02 PM, Akhil Das 
<[email protected]<mailto:[email protected]>> wrote:
Hi Ghousia,

You can try the following:

1. Increase the heap 
size<https://spark.apache.org/docs/0.9.0/configuration.html>
2. Increase the number of 
partitions<http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine>
3. You could try persisting the RDD to use 
DISK_ONLY<http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence>

Thanks
Best Regards

On Mon, Aug 18, 2014 at 10:40 AM, Ghousia Taj 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I am trying to implement machine learning algorithms on Spark. I am working
on a 3 node cluster, with each node having 5GB of memory. Whenever I am
working with slightly more number of records, I end up with OutOfMemory
Error. Problem is, even if number of records is slightly high, the
intermediate result from a transformation is huge and this results in
OutOfMemory Error. To overcome this, we are partitioning the data such that
each partition has only a few records.

Is there any better way to fix this issue. Some thing like spilling the
intermediate data to local disk?

Thanks,
Ghousia.

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-Error-tp12275.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>
For additional commands, e-mail: 
[email protected]<mailto:[email protected]>

RE: OutOfMemory Error

Reply via email to