the actual case looks like this:* spark 1.1.0 on yarn (cdh 5.2.1)* ~8-10 executors, 36GB phys RAM per host* input RDD is roughly 3GB containing ~150-200M items (and this RDD is made persistent using .cache())* using pyspark yarn is configured with the limit yarn.nodemanager.resource.memory-mb of 33792 (33GB), spark is set to be:SPARK_EXECUTOR_CORES=6SPARK_EXECUTOR_INSTANCES=9SPARK_EXECUTOR_MEMORY=30G when using higher rank (above 20) for ALS.trainImplicit the executor runs after some time (~hour) of execution out of the yarn limit and gets killed: 2015-01-09 17:51:27,130 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=27125,containerID=container_1420871936411_0002_01_000023] is running beyond physical memory limits. Current usage: 31.2 GB of 31 GB physical memory used; 34.7 GB of 65.1 GB virtual memory used. Killing container. thanks for any ideas,Antony.
On Saturday, 10 January 2015, 10:11, Antony Mayi <antonym...@yahoo.com> wrote: the memory requirements seem to be rapidly growing hen using higher rank... I am unable to get over 20 without running out of memory. is this expected?thanks, Antony.