That is not really possible the whole project is rather large and I would not like to release it before I published the results.
But if there is no know issues with doing spark in a for loop I will look into other possibilities for memory leaks. Thanks On 20 Sep 2017 15:22, "Weichen Xu" <weichen...@databricks.com> wrote: Spark manage memory allocation and release automatically. Can you post the complete program which help checking where is wrong ? On Wed, Sep 20, 2017 at 8:12 PM, Alexander Czech < alexander.cz...@googlemail.com> wrote: > Hello all, > > I'm running a pyspark script that makes use of for loop to create smaller > chunks of my main dataset. > > some example code: > > for chunk in chunks: > my_rdd = sc.parallelize(chunk).flatmap(somefunc) > # do some stuff with my_rdd > > my_df = make_df(my_rdd) > # do some stuff with my_df > my_df.write.parquet('./some/path') > > After a couple of loops I always start to loose executors because out of > memory errors. Is there a way free up memory after an loop? Do I have to do > it in python or with spark? > > Thanks >