I think the concept of task in spark should be on the same level of task in MR. Usually in MR, we need to specify the memory the each mapper/reducer task. And I believe executor is not a user-facing concept, it's a spark internal concept. For spark users they don't need to know the concept of executor, but need to know the concept of task.
On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov <evo.efti...@isecc.com> wrote: > This is the first time I hear that “one can specify the RAM per task” – > the RAM is granted per Executor (JVM). On the other hand each Task operates > on ONE RDD Partition – so you can say that this is “the RAM allocated to > the Task to process” – but it is still within the boundaries allocated to > the Executor (JVM) within which the Task is running. Also while running, > any Task like any JVM Thread can request as much additional RAM e.g. for > new Object instances as there is available in the Executor aka JVM Heap > > > > *From:* canan chen [mailto:ccn...@gmail.com] > *Sent:* Tuesday, May 26, 2015 9:30 AM > *To:* Evo Eftimov > *Cc:* user@spark.apache.org > *Subject:* Re: How does spark manage the memory of executor with multiple > tasks > > > > Yes, I know that one task represent a JVM thread. This is what I confused. > Usually users want to specify the memory on task level, so how can I do it > if task if thread level and multiple tasks runs in the same executor. And > even I don't know how many threads there will be. Besides that, if one task > cause OOM, it would cause other tasks in the same executor fail too. > There's no isolation between tasks. > > > > On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov <evo.efti...@isecc.com> > wrote: > > An Executor is a JVM instance spawned and running on a Cluster Node > (Server machine). Task is essentially a JVM Thread – you can have as many > Threads as you want per JVM. You will also hear about “Executor Slots” – > these are essentially the CPU Cores available on the machine and granted > for use to the Executor > > > > Ps: what creates ongoing confusion here is that the Spark folks have > “invented” their own terms to describe the design of their what is > essentially a Distributed OO Framework facilitating Parallel Programming > and Data Management in a Distributed Environment, BUT have not provided > clear dictionary/explanations linking these “inventions” with standard > concepts familiar to every Java, Scala etc developer > > > > *From:* canan chen [mailto:ccn...@gmail.com] > *Sent:* Tuesday, May 26, 2015 9:02 AM > *To:* user@spark.apache.org > *Subject:* How does spark manage the memory of executor with multiple > tasks > > > > Since spark can run multiple tasks in one executor, so I am curious to > know how does spark manage memory across these tasks. Say if one executor > takes 1GB memory, then if this executor can run 10 tasks simultaneously, > then each task can consume 100MB on average. Do I understand it correctly ? > It doesn't make sense to me that spark run multiple tasks in one executor. > > >