I do think that there is an option to set the temporary shuffle location to a particular directory. While working with EMR I set it to /mnt1/. Let me know in case you are not able to find it.
On Mon, Dec 18, 2017 at 8:10 PM, Mihai Iacob <mia...@ca.ibm.com> wrote: > This code generates files under /tmp...blockmgr... which do not get > cleaned up after the job finishes. > > Anything wrong with the code below? or are there any known issues with > spark not cleaning up /tmp files? > > > window = Window.\ > partitionBy('***', 'date_str').\ > orderBy(sqlDf['***']) > > sqlDf = sqlDf.withColumn("***",rank().over(window)) > df_w_least = sqlDf.filter("***=1") > > > > > > Regards, > > *Mihai Iacob* > DSX Local <https://datascience.ibm.com/local> - Security, IBM Analytics > > --------------------------------------------------------------------- To > unsubscribe e-mail: user-unsubscr...@spark.apache.org