Hi all, For my master thesis I will be characterising performance of two-level schedulers like Mesos and after reading the paper: https://www.cs.berkeley.edu/~alig/papers/mesos.pdf where Spark is also introduced I am wondering how some experiments and results came about. If this is not the place to ask these questions, or someone knows better places, please let me know.
I am wondering if the experiment could show the same results if we would use the current release of Spark, because in the macro-benchmarks (Fig. 5c), we can see 4 instances (though the text talks of of 5 instances) of Spark applications being run. During 1 instance Sparks seems to elastically grow especially between [0,200] and [900,1100]. Already this would be problematic to recreate in current Spark on Mesos, because once an application context starts, it 1) allocates all available nodes in the cluster and does not scale up or down during that application’s lifetime in CoarseGrained mode — or 2) it allocates all memory, and does not release it, though it scales up and down with regard to CPUs in FineGrained mode. Even in FineGrained mode it would not work well if there are other frameworks who need a lot of memory, because they simply wouldn’t be able to allocate it, because even during idle times of a spark application, the cluster’s memory is taken. Of course we could limit the memory usage, but this defeats the purpose of having Mesos. Does someone know, 1) Was there a memory limit for Spark during the experiments in the paper (and thus was the nowadays FineGrained mode chosen), so that other frameworks would also be able to run? or 2) Was the Spark architecture vastly different back then? Any other remarks, even anecdotal, are very welcome Hans