Re: What could be the cause of an execution freeze on Hadoop for small datasets?

sam smith Sat, 11 Mar 2023 11:24:59 -0800

" In this case your program may work because effectively you are not using
the spark in yarn on the hadoop cluster  " I am actually using Yarn as
mentioned (client mode)
I already know that, but it is not just about collectAsList, the execution
freezes also for example when using save() on the dataset (after the
transformations, before them it is ok to perform save() on the dataset).


I hope the question is clearer (for anybody who's reading) now.

Le sam. 11 mars 2023 à 20:15, Mich Talebzadeh <[email protected]> a
écrit :

> collectAsList brings all the data into the driver which is a single JVM
> on a single node. In this case your program may work because effectively
> you are not using the spark in yarn on the hadoop cluster. The benefit of
> Spark is that you can process a large amount of data using the memory and
> processors across multiple executors on multiple nodes.
>
>
> HTH
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 11 Mar 2023 at 19:01, sam smith <[email protected]>
> wrote:
>
>> not sure what you mean by your question, but it is not helping in any case
>>
>>
>> Le sam. 11 mars 2023 à 19:54, Mich Talebzadeh <[email protected]>
>> a écrit :
>>
>>>
>>>
>>> ... To note that if I execute collectAsList on the dataset at the
>>> beginning of the program....
>>>
>>> What do you think  collectAsList does?
>>>
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sat, 11 Mar 2023 at 18:29, sam smith <[email protected]>
>>> wrote:
>>>
>>>> Hello guys,
>>>>
>>>> I am launching through code (client mode) a Spark program to run in
>>>> Hadoop. If I execute on the dataset methods of the likes of show() and
>>>> count() or collectAsList() (that are displayed in the Spark UI) after
>>>> performing heavy transformations on the columns then the mentioned methods
>>>> will cause the execution to freeze on Hadoop and that independently of the
>>>> dataset size (intriguing issue for small size datasets!).
>>>> Any idea what could be causing this type of issue?
>>>> To note that if I execute collectAsList on the dataset at the beginning
>>>> of the program (before performing the transformations on the columns) then
>>>> the method yields results correctly.
>>>>
>>>> Thanks.
>>>> Regards
>>>>
>>>>

Re: What could be the cause of an execution freeze on Hadoop for small datasets?

Reply via email to