ikshah1...@gmail.com]
> *Sent:* Tuesday, June 20, 2017 8:50 PM
> *To:* Mendelson, Assaf
> *Cc:* user@spark.apache.org
> *Subject:* Re: Merging multiple Pandas dataframes
>
>
>
> Hi Assaf,
>
> Thanks for the suggestion on checkpointing - I'll need to read up more
,
Assaf.
From: Saatvik Shah [mailto:saatvikshah1...@gmail.com]
Sent: Tuesday, June 20, 2017 8:50 PM
To: Mendelson, Assaf
Cc: user@spark.apache.org
Subject: Re: Merging multiple Pandas dataframes
Hi Assaf,
Thanks for the suggestion on checkpointing - I'll need to read up more on that.
My
Hi Assaf,
Thanks for the suggestion on checkpointing - I'll need to read up more on
that.
My current implementation seems to be crashing with a GC memory limit
exceeded error if Im keeping multiple persist calls for a large number of
files.
Thus, I was also thinking about the constant calls to
Note that depending on the number of iterations, the query plan for the
dataframe can become long and this can cause slowdowns (or even crashes).
A possible solution would be to checkpoint (or simply save and reload the
dataframe) every once in a while. When reloading from disk, the newly loaded