Hi,
What I understood is that basically you set the number of tasks when the
topology starts and that cannot be changed later. If there are N tasks for
T1 threads with N > T1, that means that the tasks are indeed running
sequentially on at least some threads. When you add more power (nodes) and
rebalance the topology, let's say you have now T2 > T1 threads, you are
actually increasing the available cluster processing power since there will
be more tasks running in parallel. This limitation comes from the fact that
the "by field" grouping needs a never changing set of bolts so the same
field value will always be directed to the same bolt (task) : the "routes"
are calculated at starting time. I'm just wondering if this could be lifted
for other shuffle type... ?
If tasks = executors then you add more nodes and rebalance : some executors
will be idle.
Conclusion : set the right number of tasks at the beginning, since this is
the max parallel processing units that you will get throughout the topology
lifetime.
David


2014-07-16 2:36 GMT+02:00 Alok Kumbhare <kumbh...@usc.edu>:

> Hi,
> I have a few questions regarding the parallelism in Storm.
>
> I have gone through the documentation at
> http://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
> but I have some confusion regarding executors and tasks.
>
> From what I understand (to simplify, assume there is only one topology
> running):
>
> Worker processes runs executor threads (one or more executor per worker).
> Since each executor is an independent thread, they run in parallel.
>
> Now the documentation says "each Executor, runs one or more task of same
> component". My question is whether the tasks run in parallel or in
> sequence? What does it mean to have multiple tasks of the same component
> per executor?
>
>
> Further, the documentation mentions: "The number of tasks for a component
> is always the same throughout the lifetime of a topology, but the number of
> executors (threads) for a component can change over time."
>
> I don't understand this part. Assuming the default configuration where
> each executor runs one task (i.e. number of tasks is equal to the number of
> executors) and I run the topology with some initial configuration. Now, if
> I use the "rebalance" command and increase the number of executors, what
> happens to the number of tasks?
>
> Please help.
>
> Thanks,
> Alok
>

Reply via email to