Hey Jonathan, Are you referring to disk space used for storing persisted RDD's? For that, Spark does not bound the amount of data persisted to disk. It's a similar story to how Spark's shuffle disk output works (and also Hadoop and other frameworks make this assumption as well for their shuffle data, AFAIK).
We could (in theory) add a storage level that bounds the amount of data persisted to disk and forces re-computation if the partition did not fit. I'd be interested to hear more about a workload where that's relevant though, before going that route. Maybe if people are using SSD's that would make sense. - Patrick On Mon, Apr 13, 2015 at 8:19 AM, Jonathan Coveney <jcove...@gmail.com> wrote: > I'm surprised that I haven't been able to find this via google, but I > haven't... > > What is the setting that requests some amount of disk space for the > executors? Maybe I'm misunderstanding how this is configured... > > Thanks for any help! --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org