That is not really possible the whole project is rather large and I would
not like to release it before I published the results.

But if there is no know issues with doing spark in a for loop I will look
into other possibilities for memory leaks.

Thanks


On 20 Sep 2017 15:22, "Weichen Xu" <weichen...@databricks.com> wrote:

Spark manage memory allocation and release automatically. Can you post the
complete program which help checking where is wrong ?

On Wed, Sep 20, 2017 at 8:12 PM, Alexander Czech <
alexander.cz...@googlemail.com> wrote:

> Hello all,
>
> I'm running a pyspark script that makes use of for loop to create smaller
> chunks of my main dataset.
>
> some example code:
>
> for chunk in chunks:
>     my_rdd = sc.parallelize(chunk).flatmap(somefunc)
>     # do some stuff with my_rdd
>
>     my_df = make_df(my_rdd)
>     # do some stuff with my_df
>     my_df.write.parquet('./some/path')
>
> After a couple of loops I always start to loose executors because out of
> memory errors. Is there a way free up memory after an loop? Do I have to do
> it in python or with spark?
>
> Thanks
>

Reply via email to