Aakash,

Like Jorn suggested, did you increase your test data set? If so, did you also 
update your executor-memory setting? It seems like you might exceeding the 
executor memory threshold.

Thanks
Vamshi Talla

Sent from my iPhone

On Jun 11, 2018, at 8:54 AM, Aakash Basu 
<aakash.spark....@gmail.com<mailto:aakash.spark....@gmail.com>> wrote:

Hi Jorn/Others,

Thanks for your help. Now, data is being distributed in a proper way, but the 
challenge is, after a certain point, I'm getting this error, after which, 
everything stops moving ahead -

2018-06-11 18:14:56 ERROR TaskSchedulerImpl:70 - Lost executor 0 on 
192.168.49.39<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.49.39&data=02%7C01%7C%7Cdc9886e0d4be43fdf0cb08d5cf9a6fda%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636643184560233393&sdata=T0QyzG2Sufk0kktKK3U2BVsAszvhCzx%2FFNnXOxpiWPs%3D&reserved=0>:
 Remote RPC client disassociated. Likely due to containers exceeding 
thresholds, or network issues. Check driver logs for WARN messages.

<image.png>

How to avoid this scenario?

Thanks,
Aakash.

On Mon, Jun 11, 2018 at 4:16 PM, Jörn Franke 
<jornfra...@gmail.com<mailto:jornfra...@gmail.com>> wrote:
If it is in kB then spark will always schedule it to one node. As soon as it 
gets bigger you will see usage of more nodes.

Hence increase your testing Dataset .

On 11. Jun 2018, at 12:22, Aakash Basu 
<aakash.spark....@gmail.com<mailto:aakash.spark....@gmail.com>> wrote:

Jorn - The code is a series of feature engineering and model tuning operations. 
Too big to show. Yes, data volume is too low, it is in KBs, just tried to 
experiment with a small dataset before going for a large one.

Akshay - I ran with your suggested spark configurations, I get this (the node 
changed, but the problem persists) -

<image.png>



On Mon, Jun 11, 2018 at 3:16 PM, akshay naidu 
<akshaynaid...@gmail.com<mailto:akshaynaid...@gmail.com>> wrote:
try
 --num-executors 3 --executor-cores 4 --executor-memory 2G --conf 
spark.scheduler.mode=FAIR

On Mon, Jun 11, 2018 at 2:43 PM, Aakash Basu 
<aakash.spark....@gmail.com<mailto:aakash.spark....@gmail.com>> wrote:
Hi,

I have submitted a job on 4 node cluster, where I see, most of the operations 
happening at one of the worker nodes and other two are simply chilling out.

Picture below puts light on that -
[cid:]
How to properly distribute the load?

My cluster conf (4 node cluster [1 driver; 3 slaves]) -

Cores - 6
RAM - 12 GB
HDD - 60 GB

My Spark Submit command is as follows -

spark-submit --master 
spark://192.168.49.37:7077<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.49.37%3A7077&data=02%7C01%7C%7Cdc9886e0d4be43fdf0cb08d5cf9a6fda%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636643184560233393&sdata=wS4drWE7%2FAJFXoUL3w0OzIRNL54RLKRTeMUBB%2BY1B28%3D&reserved=0>
 --num-executors 3 --executor-cores 5 --executor-memory 4G 
/appdata/bblite-codebase/prima_diabetes_indians.py

What to do?

Thanks,
Aakash.



Reply via email to