I suspect OOO happens in executor side, you have to check the stacktrace by
yourself if you can not attach more info. Most likely it is due to your
user code.


Alexander Czech <alexander.cz...@googlemail.com>于2017年9月21日周四 下午5:54写道:

> That is not really possible the whole project is rather large and I would
> not like to release it before I published the results.
>
> But if there is no know issues with doing spark in a for loop I will look
> into other possibilities for memory leaks.
>
> Thanks
>
>
> On 20 Sep 2017 15:22, "Weichen Xu" <weichen...@databricks.com> wrote:
>
> Spark manage memory allocation and release automatically. Can you post the
> complete program which help checking where is wrong ?
>
> On Wed, Sep 20, 2017 at 8:12 PM, Alexander Czech <
> alexander.cz...@googlemail.com> wrote:
>
>> Hello all,
>>
>> I'm running a pyspark script that makes use of for loop to create
>> smaller chunks of my main dataset.
>>
>> some example code:
>>
>> for chunk in chunks:
>>     my_rdd = sc.parallelize(chunk).flatmap(somefunc)
>>     # do some stuff with my_rdd
>>
>>     my_df = make_df(my_rdd)
>>     # do some stuff with my_df
>>     my_df.write.parquet('./some/path')
>>
>> After a couple of loops I always start to loose executors because out of
>> memory errors. Is there a way free up memory after an loop? Do I have to do
>> it in python or with spark?
>>
>> Thanks
>>
>
>
>

Reply via email to