Hi,

I am new to Spark. I have done following tests and I am confused in
conclusions. I have 2 queries.

Following is the detail of test

Test 1) Used 11 Node Cluster where each machine has 64 GB RAM and 4
physical cores. I ran a ALS algorithm using MilLib on 1.6 GB data set. I
ran 10 executors and my Rating data set has 20 partitions. It works. In
order to increase parallelism, I did 100 partitions instead of 20 and now
program does not work and it throws out of memory error.

Query a): As I had 4 cores on each machine , but my number of partitions
are 10 in each executor and my cores are not sufficient for partitions. Is
it supposed to give memory errors when this kind of misconfiguration.If
there are not sufficient cores and processing cannot be done in parallel,
can different partitions not be processed sequentially and operation could
have become slow rather than throwing memory error.

Query b)  If it gives error, then error message is not meaningful Here my
DAG was very simple and I could trace that lowering number of partitions is
working, but if on misconfiguration of cores it throws error, then how to
debug it in complex DAGs as error does not tell explicitly that problem
could be due to low number of cores. If my understanding is incorrect, then
kindly explain the reasons of error in this case

Thanks and Regards
Aniruddh

Reply via email to