It`s quite impossible for anyone to answer your question about what is eating your memory, without even knowing what language you are using.
If you are using C then it`s always pointers, that's the mem issue. If you are using python, there can be some like not using context manager like With Context Managers and Python's with Statement <https://realpython.com/python-with-statement/> And another can be not to close resources after use. In my experience you can process 3 years or more of data, IF you are closing opened resources. I use the web GUI http://spark:4040 to follow what spark is doing. ons. 30. mar. 2022 kl. 17:41 skrev Joris Billen < joris.bil...@bigindustries.be>: > Thanks for answer-much appreciated! This forum is very useful :-) > > I didnt know the sparkcontext stays alive. I guess this is eating up > memory. The eviction means that he knows that he should clear some of the > old cached memory to be able to store new one. In case anyone has good > articles about memory leaks I would be interested to read. > I will try to add following lines at the end of my job (as I cached the > table in spark sql): > > > *sqlContext.sql("UNCACHE TABLE mytableofinterest ")* > *spark.stop()* > > > Wrt looping: if I want to process 3 years of data, my modest cluster will > never do it one go , I would expect? I have to break it down in smaller > pieces and run that in a loop (1 day is already lots of data). > > > > Thanks! > > > > > On 30 Mar 2022, at 17:25, Sean Owen <sro...@gmail.com> wrote: > > The Spark context does not stop when a job does. It stops when you stop > it. There could be many ways mem can leak. Caching maybe - but it will > evict. You should be clearing caches when no longer needed. > > I would guess it is something else your program holds on to in its logic. > > Also consider not looping; there is probably a faster way to do it in one > go. > > On Wed, Mar 30, 2022, 10:16 AM Joris Billen <joris.bil...@bigindustries.be> > wrote: > >> Hi, >> I have a pyspark job submitted through spark-submit that does some heavy >> processing for 1 day of data. It runs with no errors. I have to loop over >> many days, so I run this spark job in a loop. I notice after couple >> executions the memory is increasing on all worker nodes and eventually this >> leads to faillures. My job does some caching, but I understand that when >> the job ends successfully, then the sparkcontext is destroyed and the cache >> should be cleared. However it seems that something keeps on filling the >> memory a bit more and more after each run. THis is the memory behaviour >> over time, which in the end will start leading to failures : >> >> (what we see is: green=physical memory used, green-blue=physical memory >> cached, grey=memory capacity =straight line around 31GB ) >> This runs on a healthy spark 2.4 and was optimized already to come to a >> stable job in terms of spark-submit resources parameters like >> driver-memory/num-executors/executor-memory/executor-cores/spark.locality.wait). >> Any clue how to “really” clear the memory in between jobs? So basically >> currently I can loop 10x and then need to restart my cluster so all memory >> is cleared completely. >> >> >> Thanks for any info! >> >> <Screenshot 2022-03-30 at 15.28.24.png> > > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297