Also try to increase the number of partions gradually – not in one big jump from 20 to 100 but adding e.g. 10 at a time and see whether there is a correlation with adding more RAM to the executors
From: Evo Eftimov [mailto:evo.efti...@isecc.com] Sent: Wednesday, July 8, 2015 1:26 PM To: 'Aniruddh Sharma'; 'user@spark.apache.org' Subject: RE: Out of Memory Errors on less number of cores in proportion to Partitions in Data Are you sure you have actually increased the RAM (how exactly did you do that and does it show in Spark UI) Also use the SPARK UI and the driver console to check the RAM allocated for each RDD and RDD partion in each of the scenarios Re b) the general rule is num of partitions = 2 x num of CPU cores All partitions are operated in parallel (by independently running JVM Threads), however if you have substantially higher num of partitions (JVM Threads) than num of core then you will get what happens in any JVM or OS – there will be switching between the Threads and some of them will be in a suspended mode waiting for free core (Thread contexts also occupy additional RAM ) From: Aniruddh Sharma [mailto:asharma...@gmail.com] Sent: Wednesday, July 8, 2015 12:52 PM To: Evo Eftimov Subject: Re: Out of Memory Errors on less number of cores in proportion to Partitions in Data Thanks for your revert... I increased executor memory from 4GB to 35 GB and still out of memory error happens. So it seems it may not be entirely due to more buffers due to more partitions. Query a) Is there a way to debug at more granular level from user code perspective where things could go wrong. Query b) In general my query is lets suppose it is not ALS (or some iterative algorithm). Lets say it is some sample RDD but which 10000 partitions and each executor has 50 partitions and each machine has 4 physical cores.So do 4 physical cores parallely try to process these 50 partitions (doing multitasking) or will it work in a way that 4 cores will first process first 4 partitions and then next 4 partitions and so on. Thanks and Regards Aniruddh On Wed, Jul 8, 2015 at 5:09 PM, Evo Eftimov <evo.efti...@isecc.com> wrote: This is most likely due to the internal implementation of ALS in MLib. Probably for each parallel unit of execution (partition in Spark terms) the implementation allocates and uses a RAM buffer where it keeps interim results during the ALS iterations If we assume that the size of that internal RAM buffer is fixed per Unit of Execution then Total RAM (20 partitions x fixed RAM buffer) < Total RAM (100 partitions x fixed RAM buffer) From: Aniruddh Sharma [mailto:asharma...@gmail.com] Sent: Wednesday, July 8, 2015 12:22 PM To: user@spark.apache.org Subject: Out of Memory Errors on less number of cores in proportion to Partitions in Data Hi, I am new to Spark. I have done following tests and I am confused in conclusions. I have 2 queries. Following is the detail of test Test 1) Used 11 Node Cluster where each machine has 64 GB RAM and 4 physical cores. I ran a ALS algorithm using MilLib on 1.6 GB data set. I ran 10 executors and my Rating data set has 20 partitions. It works. In order to increase parallelism, I did 100 partitions instead of 20 and now program does not work and it throws out of memory error. Query a): As I had 4 cores on each machine , but my number of partitions are 10 in each executor and my cores are not sufficient for partitions. Is it supposed to give memory errors when this kind of misconfiguration.If there are not sufficient cores and processing cannot be done in parallel, can different partitions not be processed sequentially and operation could have become slow rather than throwing memory error. Query b) If it gives error, then error message is not meaningful Here my DAG was very simple and I could trace that lowering number of partitions is working, but if on misconfiguration of cores it throws error, then how to debug it in complex DAGs as error does not tell explicitly that problem could be due to low number of cores. If my understanding is incorrect, then kindly explain the reasons of error in this case Thanks and Regards Aniruddh