Re: ALS.trainImplicit running out of mem when using higher rank

Antony Mayi Fri, 16 Jan 2015 21:30:07 -0800

although this helped to improve it significantly I still run into this problem 
despite increasing the spark.yarn.executor.memoryOverhead vastly:
export SPARK_EXECUTOR_MEMORY=24Gspark.yarn.executor.memoryOverhead=6144

yet getting this:2015-01-17 04:47:40,389 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Container [pid=30211,containerID=container_1421451766649_0002_01_115969] is 
running beyond physical memory limits. Current usage: 30.1 GB of 30 GB physical 
memory used; 33.0 GB of 63.0 GB virtual memory used. Killing container.
is there anything more I can do?
thanks,Antony. 

     On Monday, 12 January 2015, 8:21, Antony Mayi <antonym...@yahoo.com> wrote:

 this seems to have sorted it, awesome, thanks for great help.Antony. 

     On Sunday, 11 January 2015, 13:02, Sean Owen <so...@cloudera.com> wrote:

 I would expect the size of the user/item feature RDDs to grow linearly
with the rank, of course. They are cached, so that would drive cache
memory usage on the cluster.

This wouldn't cause executors to fail for running out of memory
though. In fact, your error does not show the task failing for lack of
memory. What it shows is that YARN thinks the task is using a little
bit more memory than it said it would, and killed it.

This happens sometimes with JVM-based YARN jobs since a JVM configured
to use X heap ends up using a bit more than X physical memory if the
heap reaches max size. So there's a bit of headroom built in and
controlled by spark.yarn.executor.memoryOverhead
(http://spark.apache.org/docs/latest/running-on-yarn.html) You can try
increasing it to a couple GB.

On Sun, Jan 11, 2015 at 9:43 AM, Antony Mayi
<antonym...@yahoo.com.invalid> wrote:
> the question really is whether this is expected that the memory requirements
> grow rapidly with the rank... as I would expect memory is rather O(1)
> problem with dependency only on the size of input data.
>
> if this is expected is there any rough formula to determine the required
> memory based on ALS input and parameters?
>
> thanks,
> Antony.
>
>
> On Saturday, 10 January 2015, 10:47, Antony Mayi <antonym...@yahoo.com>
> wrote:
>
>
>
> the actual case looks like this:
> * spark 1.1.0 on yarn (cdh 5.2.1)
> * ~8-10 executors, 36GB phys RAM per host
> * input RDD is roughly 3GB containing ~150-200M items (and this RDD is made
> persistent using .cache())
> * using pyspark
>
> yarn is configured with the limit yarn.nodemanager.resource.memory-mb of
> 33792 (33GB), spark is set to be:
> SPARK_EXECUTOR_CORES=6
> SPARK_EXECUTOR_INSTANCES=9
> SPARK_EXECUTOR_MEMORY=30G
>
> when using higher rank (above 20) for ALS.trainImplicit the executor runs
> after some time (~hour) of execution out of the yarn limit and gets killed:
>
> 2015-01-09 17:51:27,130 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Container [pid=27125,containerID=container_1420871936411_0002_01_000023] is
> running beyond physical memory limits. Current usage: 31.2 GB of 31 GB
> physical memory used; 34.7 GB of 65.1 GB virtual memory used. Killing
> container.
>
> thanks for any ideas,
> Antony.
>
>
>
> On Saturday, 10 January 2015, 10:11, Antony Mayi <antonym...@yahoo.com>
> wrote:
>
>
>
> the memory requirements seem to be rapidly growing hen using higher rank...
> I am unable to get over 20 without running out of memory. is this expected?
> thanks, Antony.
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: ALS.trainImplicit running out of mem when using higher rank

Reply via email to