Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

Bjørn Jørgensen Wed, 30 Mar 2022 12:24:56 -0700

It`s quite impossible for anyone to answer your question about what is
eating your memory, without even knowing what language you are using.


If you are using C then it`s always pointers, that's the mem issue.
If you are using python, there can be some like not using context manager
like With Context Managers and Python's with Statement
<https://realpython.com/python-with-statement/>
And another can be not to close resources after use.

In my experience you can process 3 years or more of data, IF you are
closing opened resources.
I use the web GUI http://spark:4040 to follow what spark is doing.




ons. 30. mar. 2022 kl. 17:41 skrev Joris Billen <
joris.bil...@bigindustries.be>:

> Thanks for answer-much appreciated! This forum is very useful :-)
>
> I didnt know the sparkcontext stays alive. I guess this is eating up
> memory.  The eviction means that he knows that he should clear some of the
> old cached memory to be able to store new one. In case anyone has good
> articles about memory leaks I would be interested to read.
> I will try to add following lines at the end of my job (as I cached the
> table in spark sql):
>
>
> *sqlContext.sql("UNCACHE TABLE mytableofinterest ")*
> *spark.stop()*
>
>
> Wrt looping: if I want to process 3 years of data, my modest cluster will
> never do it one go , I would expect? I have to break it down in smaller
> pieces and run that in a loop (1 day is already lots of data).
>
>
>
> Thanks!
>
>
>
>
> On 30 Mar 2022, at 17:25, Sean Owen <sro...@gmail.com> wrote:
>
> The Spark context does not stop when a job does. It stops when you stop
> it. There could be many ways mem can leak. Caching maybe - but it will
> evict. You should be clearing caches when no longer needed.
>
> I would guess it is something else your program holds on to in its logic.
>
> Also consider not looping; there is probably a faster way to do it in one
> go.
>
> On Wed, Mar 30, 2022, 10:16 AM Joris Billen <joris.bil...@bigindustries.be>
> wrote:
>
>> Hi,
>> I have a pyspark job submitted through spark-submit that does some heavy
>> processing for 1 day of data. It runs with no errors. I have to loop over
>> many days, so I run this spark job in a loop. I notice after couple
>> executions the memory is increasing on all worker nodes and eventually this
>> leads to faillures. My job does some caching, but I understand that when
>> the job ends successfully, then the sparkcontext is destroyed and the cache
>> should be cleared. However it seems that something keeps on filling the
>> memory a bit more and more after each run. THis is the memory behaviour
>> over time, which in the end will start leading to failures :
>>
>> (what we see is: green=physical memory used, green-blue=physical memory
>> cached, grey=memory capacity =straight line around 31GB )
>> This runs on a healthy spark 2.4 and was optimized already to come to a
>> stable job in terms of spark-submit resources parameters like
>> driver-memory/num-executors/executor-memory/executor-cores/spark.locality.wait).
>> Any clue how to “really” clear the memory in between jobs? So basically
>> currently I can loop 10x and then need to restart my cluster so all memory
>> is cleared completely.
>>
>>
>> Thanks for any info!
>>
>> <Screenshot 2022-03-30 at 15.28.24.png>
>
>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

Reply via email to