Re: /tmp fills up to 100GB when using a window function

Gourav Sengupta Wed, 20 Dec 2017 11:59:02 -0800

I do think that there is an option to set the temporary shuffle location to
a particular directory. While working with EMR I set it to /mnt1/. Let me
know in case you are not able to find it.


On Mon, Dec 18, 2017 at 8:10 PM, Mihai Iacob <mia...@ca.ibm.com> wrote:

> This code generates files under /tmp...blockmgr... which do not get
> cleaned up after the job finishes.
>
> Anything wrong with the code below? or are there any known issues with
> spark not cleaning up /tmp files?
>
>
> window = Window.\
>               partitionBy('***', 'date_str').\
>               orderBy(sqlDf['***'])
>
> sqlDf = sqlDf.withColumn("***",rank().over(window))
> df_w_least = sqlDf.filter("***=1")
>
>
>
>
>
> Regards,
>
> *Mihai Iacob*
> DSX Local <https://datascience.ibm.com/local> - Security, IBM Analytics
>
> --------------------------------------------------------------------- To
> unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: /tmp fills up to 100GB when using a window function

Reply via email to