RE: Out of Memory Errors on less number of cores in proportion to Partitions in Data

Evo Eftimov Wed, 08 Jul 2015 04:41:29 -0700

This is most likely due to the internal implementation of ALS in MLib. Probably 
for each parallel unit of execution (partition in Spark terms) the 
implementation allocates and uses a RAM buffer where it keeps interim results 
during the ALS iterations


 

If we assume that the size of that internal RAM buffer is fixed per Unit of 
Execution then Total RAM (20 partitions x fixed RAM buffer) < Total RAM (100 
partitions x fixed RAM buffer) 

 

From: Aniruddh Sharma [mailto:asharma...@gmail.com] 
Sent: Wednesday, July 8, 2015 12:22 PM
To: user@spark.apache.org
Subject: Out of Memory Errors on less number of cores in proportion to 
Partitions in Data

 

Hi,

I am new to Spark. I have done following tests and I am confused in 
conclusions. I have 2 queries.

Following is the detail of test

Test 1) Used 11 Node Cluster where each machine has 64 GB RAM and 4 physical 
cores. I ran a ALS algorithm using MilLib on 1.6 GB data set. I ran 10 
executors and my Rating data set has 20 partitions. It works. In order to 
increase parallelism, I did 100 partitions instead of 20 and now program does 
not work and it throws out of memory error.

 

Query a): As I had 4 cores on each machine , but my number of partitions are 10 
in each executor and my cores are not sufficient for partitions. Is it supposed 
to give memory errors when this kind of misconfiguration.If there are not 
sufficient cores and processing cannot be done in parallel, can different 
partitions not be processed sequentially and operation could have become slow 
rather than throwing memory error.

Query b)  If it gives error, then error message is not meaningful Here my DAG 
was very simple and I could trace that lowering number of partitions is 
working, but if on misconfiguration of cores it throws error, then how to debug 
it in complex DAGs as error does not tell explicitly that problem could be due 
to low number of cores. If my understanding is incorrect, then kindly explain 
the reasons of error in this case

 

Thanks and Regards

Aniruddh

RE: Out of Memory Errors on less number of cores in proportion to Partitions in Data

Reply via email to