I'm running sortByKey on a dataset that's nearly the amount of memory I've provided to executors (I'd like to keep the amount of used memory low so other jobs can run), and I'm getting the vague "filesystem closed" error. When I re-run with more memory it runs fine.
By default shouldn't sortByKey be spilling to disk? I'm fine with that, this is a scheduled job where runtime isn't a big issue, and preserving memory for other jobs is more important. What can I do to ensure that sortByKey spills to disk and doesn't result in that error? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sortByKey-not-spilling-to-disk-PySpark-1-3-tp25660.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org