Hi, What I understood is that basically you set the number of tasks when the topology starts and that cannot be changed later. If there are N tasks for T1 threads with N > T1, that means that the tasks are indeed running sequentially on at least some threads. When you add more power (nodes) and rebalance the topology, let's say you have now T2 > T1 threads, you are actually increasing the available cluster processing power since there will be more tasks running in parallel. This limitation comes from the fact that the "by field" grouping needs a never changing set of bolts so the same field value will always be directed to the same bolt (task) : the "routes" are calculated at starting time. I'm just wondering if this could be lifted for other shuffle type... ? If tasks = executors then you add more nodes and rebalance : some executors will be idle. Conclusion : set the right number of tasks at the beginning, since this is the max parallel processing units that you will get throughout the topology lifetime. David
2014-07-16 2:36 GMT+02:00 Alok Kumbhare <kumbh...@usc.edu>: > Hi, > I have a few questions regarding the parallelism in Storm. > > I have gone through the documentation at > http://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html > but I have some confusion regarding executors and tasks. > > From what I understand (to simplify, assume there is only one topology > running): > > Worker processes runs executor threads (one or more executor per worker). > Since each executor is an independent thread, they run in parallel. > > Now the documentation says "each Executor, runs one or more task of same > component". My question is whether the tasks run in parallel or in > sequence? What does it mean to have multiple tasks of the same component > per executor? > > > Further, the documentation mentions: "The number of tasks for a component > is always the same throughout the lifetime of a topology, but the number of > executors (threads) for a component can change over time." > > I don't understand this part. Assuming the default configuration where > each executor runs one task (i.e. number of tasks is equal to the number of > executors) and I run the topology with some initial configuration. Now, if > I use the "rebalance" command and increase the number of executors, what > happens to the number of tasks? > > Please help. > > Thanks, > Alok >