Yes, but when I did increase my executor memory, the spark job is going to
halt after running a few steps, even though, the executor isn't dying.

Data - 60,000 data-points, 230 columns (60 MB data).

Any input on why it behaves like that?

On Tue, Jun 12, 2018 at 8:15 AM, Vamshi Talla <vamsh...@hotmail.com> wrote:

> Aakash,
>
> Like Jorn suggested, did you increase your test data set? If so, did you
> also update your executor-memory setting? It seems like you might exceeding
> the executor memory threshold.
>
> Thanks
> Vamshi Talla
>
> Sent from my iPhone
>
> On Jun 11, 2018, at 8:54 AM, Aakash Basu <aakash.spark....@gmail.com>
> wrote:
>
> Hi Jorn/Others,
>
> Thanks for your help. Now, data is being distributed in a proper way, but
> the challenge is, after a certain point, I'm getting this error, after
> which, everything stops moving ahead -
>
> 2018-06-11 18:14:56 ERROR TaskSchedulerImpl:70 - Lost executor 0 on
> 192.168.49.39
> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.49.39&data=02%7C01%7C%7Cdc9886e0d4be43fdf0cb08d5cf9a6fda%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636643184560233393&sdata=T0QyzG2Sufk0kktKK3U2BVsAszvhCzx%2FFNnXOxpiWPs%3D&reserved=0>:
> Remote RPC client disassociated. Likely due to containers exceeding
> thresholds, or network issues. Check driver logs for WARN messages.
>
> <image.png>
>
> How to avoid this scenario?
>
> Thanks,
> Aakash.
>
> On Mon, Jun 11, 2018 at 4:16 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> If it is in kB then spark will always schedule it to one node. As soon as
>> it gets bigger you will see usage of more nodes.
>>
>> Hence increase your testing Dataset .
>>
>> On 11. Jun 2018, at 12:22, Aakash Basu <aakash.spark....@gmail.com>
>> wrote:
>>
>> Jorn - The code is a series of feature engineering and model tuning
>> operations. Too big to show. Yes, data volume is too low, it is in KBs,
>> just tried to experiment with a small dataset before going for a large one.
>>
>> Akshay - I ran with your suggested spark configurations, I get this (the
>> node changed, but the problem persists) -
>>
>> <image.png>
>>
>>
>>
>> On Mon, Jun 11, 2018 at 3:16 PM, akshay naidu <akshaynaid...@gmail.com>
>> wrote:
>>
>>> try
>>>  --num-executors 3 --executor-cores 4 --executor-memory 2G --conf
>>> spark.scheduler.mode=FAIR
>>>
>>> On Mon, Jun 11, 2018 at 2:43 PM, Aakash Basu <aakash.spark....@gmail.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> I have submitted a job on* 4 node cluster*, where I see, most of the
>>>> operations happening at one of the worker nodes and other two are simply
>>>> chilling out.
>>>>
>>>> Picture below puts light on that -
>>>>
>>>> How to properly distribute the load?
>>>>
>>>> My cluster conf (4 node cluster [1 driver; 3 slaves]) -
>>>>
>>>> *Cores - 6*
>>>> *RAM - 12 GB*
>>>> *HDD - 60 GB*
>>>>
>>>> My Spark Submit command is as follows -
>>>>
>>>> *spark-submit --master spark://192.168.49.37:7077
>>>> <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.49.37%3A7077&data=02%7C01%7C%7Cdc9886e0d4be43fdf0cb08d5cf9a6fda%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636643184560233393&sdata=wS4drWE7%2FAJFXoUL3w0OzIRNL54RLKRTeMUBB%2BY1B28%3D&reserved=0>
>>>> --num-executors 3 --executor-cores 5 --executor-memory 4G
>>>> /appdata/bblite-codebase/prima_diabetes_indians.py*
>>>>
>>>> What to do?
>>>>
>>>> Thanks,
>>>> Aakash.
>>>>
>>>
>>>
>>
>

Reply via email to