I'm running sortByKey on a dataset that's nearly the amount of memory I've
provided to executors (I'd like to keep the amount of used memory low so
other jobs can run), and I'm getting the vague "filesystem closed" error. 
When I re-run with more memory it runs fine.

By default shouldn't sortByKey be spilling to disk?  I'm fine with that,
this is a scheduled job where runtime isn't a big issue, and preserving
memory for other jobs is more important.  What can I do to ensure that
sortByKey spills to disk and doesn't result in that error?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/sortByKey-not-spilling-to-disk-PySpark-1-3-tp25660.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to