To be precise, the MesosExecutorBackend's Xms & Xmx equal spark.executor.memory. So there's no question of expanding or contracting the memory held by the executor.
On Sat, Oct 17, 2015 at 5:38 PM, Bharath Ravi Kumar <reachb...@gmail.com> wrote: > David, Tom, > > Thanks for the explanation. This confirms my suspicion that the executor > was holding on to memory regardless of tasks in execution once it expands > to occupy memory in keeping with spark.executor.memory. There certainly is > scope for improvement here, though I realize there will substantial > overheads in implementing memory release without compromising RDD caching > and similar aspects. I'll explore alternatives / workarounds meanwhile. > > Thanks, > Bharath > > > > On Sat, Oct 17, 2015 at 3:33 PM, Tom Arnfeld <t...@duedil.com> wrote: > >> Hi Bharath, >> >> When running jobs in fine grained mode, each Spark task is sent to mesos >> as a task which allows the offers system to maintain fairness between >> different spark application (as you've described). Having said that, unless >> your memory per-node is hugely undersubscribed when running these jobs in >> parallel. This behaviour matches exactly what you've described. >> >> What you're seeing happens because even though there's a new mesos task >> for each Spark task (allowing CPU to be shared) the Spark executors don't >> get killed even when they aren't doing any work, which means the memory >> isn't released. The JVM doesn't allow for flexible memory re-allocation (as >> far as i'm aware) which make it impossible for spark to dynamically scale >> up the memory of the executor over time as tasks start and finish. >> >> As Dave pointed out, the simplest way to solve this is to use a higher >> level tool that can run your spark jobs through one mesos framework and >> then you can let spark distribute the resources more effectively. >> >> I hope that helps! >> >> Tom. >> >> On 17 Oct 2015, at 06:47, Bharath Ravi Kumar <reachb...@gmail.com> wrote: >> >> Can someone respond if you're aware of the reason for such a memory >> footprint? It seems unintuitive and hard to reason about. >> >> Thanks, >> Bharath >> >> On Thu, Oct 15, 2015 at 12:29 PM, Bharath Ravi Kumar <reachb...@gmail.com >> > wrote: >> >>> Resending since user@mesos bounced earlier. My apologies. >>> >>> On Thu, Oct 15, 2015 at 12:19 PM, Bharath Ravi Kumar < >>> reachb...@gmail.com> wrote: >>> >>>> (Reviving this thread since I ran into similar issues...) >>>> >>>> I'm running two spark jobs (in mesos fine grained mode), each belonging >>>> to a different mesos role, say low and high. The low:high mesos weights are >>>> 1:10. On expected lines, I see that the low priority job occupies cluster >>>> resources to the maximum extent when running alone. However, when the high >>>> priority job is submitted, it does not start and continues to await cluster >>>> resources (as seen in the logs). Since the jobs run in fine grained mode >>>> and the low priority tasks begin to finish, the high priority job should >>>> ideally be able to start and gradually take over cluster resources as per >>>> the weights. However, I noticed that while the "low" job gives up CPU cores >>>> with each completing task (e.g. reduction from 72 -> 12 with default >>>> parallelism set to 72), the memory resources are held on (~500G out of >>>> 768G). The spark.executor.memory setting appears to directly impact the >>>> amount of memory that the job holds on to. In this case, it was set to 200G >>>> in the low priority task and 100G in the high priority task. The nature of >>>> these jobs is such that setting the numbers to smaller values (say 32g) >>>> resulted in job failures with outofmemoryerror. It appears that the spark >>>> framework is retaining memory (across tasks) proportional to >>>> spark.executor.memory for the duration of the job and not releasing memory >>>> as tasks complete. This defeats the purpose of fine grained mode execution >>>> as the memory occupancy is preventing the high priority job from accepting >>>> the prioritized cpu offers and beginning execution. Can this be explained / >>>> documented better please? >>>> >>>> Thanks, >>>> Bharath >>>> >>>> On Sat, Apr 11, 2015 at 10:59 PM, Tim Chen <t...@mesosphere.io> wrote: >>>> >>>>> (Adding spark user list) >>>>> >>>>> Hi Tom, >>>>> >>>>> If I understand correctly you're saying that you're running into >>>>> memory problems because the scheduler is allocating too much CPUs and not >>>>> enough memory to acoomodate them right? >>>>> >>>>> In the case of fine grain mode I don't think that's a problem since we >>>>> have a fixed amount of CPU and memory per task. >>>>> However, in coarse grain you can run into that problem if you're with >>>>> in the spark.cores.max limit, and memory is a fixed number. >>>>> >>>>> I have a patch out to configure how much max cpus should coarse grain >>>>> executor use, and it also allows multiple executors in coarse grain mode. >>>>> So you could say try to launch multiples of max 4 cores with >>>>> spark.executor.memory (+ overhead and etc) in a slave. ( >>>>> https://github.com/apache/spark/pull/4027) >>>>> >>>>> It also might be interesting to include a cores to memory multiplier >>>>> so that with a larger amount of cores we try to scale the memory with some >>>>> factor, but I'm not entirely sure that's intuitive to use and what people >>>>> know what to set it to, as that can likely change with different workload. >>>>> >>>>> Tim >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Sat, Apr 11, 2015 at 9:51 AM, Tom Arnfeld <t...@duedil.com> wrote: >>>>> >>>>>> We're running Spark 1.3.0 (with a couple of patches over the top for >>>>>> docker related bits). >>>>>> >>>>>> I don't think SPARK-4158 is related to what we're seeing, things do >>>>>> run fine on the cluster, given a ridiculously large executor memory >>>>>> configuration. As for SPARK-3535 although that looks useful I think we'e >>>>>> seeing something else. >>>>>> >>>>>> Put a different way, the amount of memory required at any given time >>>>>> by the spark JVM process is directly proportional to the amount of CPU it >>>>>> has, because more CPU means more tasks and more tasks means more memory. >>>>>> Even if we're using coarse mode, the amount of executor memory should be >>>>>> proportionate to the amount of CPUs in the offer. >>>>>> >>>>>> On 11 April 2015 at 17:39, Brenden Matthews <bren...@diddyinc.com> >>>>>> wrote: >>>>>> >>>>>>> I ran into some issues with it a while ago, and submitted a couple >>>>>>> PRs to fix it: >>>>>>> >>>>>>> https://github.com/apache/spark/pull/2401 >>>>>>> https://github.com/apache/spark/pull/3024 >>>>>>> >>>>>>> Do these look relevant? What version of Spark are you running? >>>>>>> >>>>>>> On Sat, Apr 11, 2015 at 9:33 AM, Tom Arnfeld <t...@duedil.com> wrote: >>>>>>> >>>>>>>> Hey, >>>>>>>> >>>>>>>> Not sure whether it's best to ask this on the spark mailing list or >>>>>>>> the mesos one, so I'll try here first :-) >>>>>>>> >>>>>>>> I'm having a bit of trouble with out of memory errors in my spark >>>>>>>> jobs... it seems fairly odd to me that memory resources can only be >>>>>>>> set at >>>>>>>> the executor level, and not also at the task level. For example, as >>>>>>>> far as >>>>>>>> I can tell there's only a *spark.executor.memory* config option. >>>>>>>> >>>>>>>> Surely the memory requirements of a single executor are quite >>>>>>>> dramatically influenced by the number of concurrent tasks running? >>>>>>>> Given a >>>>>>>> shared cluster, I have no idea what % of an individual slave my >>>>>>>> executor is >>>>>>>> going to get, so I basically have to set the executor memory to a value >>>>>>>> that's correct when the whole machine is in use... >>>>>>>> >>>>>>>> Has anyone else running Spark on Mesos come across this, or maybe >>>>>>>> someone could correct my understanding of the config options? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> Tom. >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> >