Thanks for replying Drew.
I'm using a field grouping to stream out the tuples generated by the spouts.

But here is what I'm trying to solve - The spouts would be IO intensive
doing periodic polling of data from external sources utilizing resources
like persistent connections (thus network bandwidth), file descriptors,
memory, etc. I was thinking that if I could distribute these spout
instances across different worker machines then there could be a good
distribution of load.

Ex: I can structure my topology "MySpout" with 10 tasks but I would like to
distribute them as 3 tasks in each worker instances (not even worker
processes). How do I influence that kind of distribution? Right now, I see
that even when I set the worker processes to 3, storm is placing all the
tasks of the spout into one worker process only.

Hope that clarifies.


On Fri, Mar 21, 2014 at 10:41 PM, Drew Goya <> wrote:

> What kind of grouping (if any) are you using on the tuples coming out of
> your spout?
> If you want them evenly spread across a number of worker bolts, set that
> bolt to subscribe to the stream using a shuffle grouping.
> Search there for "Stream groupings"
> On Thu, Mar 20, 2014 at 10:11 AM, Srinath C <> wrote:
>> Anyone?
>> On Wed, Mar 19, 2014 at 6:35 AM, Srinath C <> wrote:
>>> Hi,
>>>    Can anyone point me to some notes on how storm decides to distribute
>>> the tasks among its workers. The behavior am seeing is that all tasks of a
>>> particular type are being grouped into one worker process.
>>>    To add more details to my use-case, I have a spout that is sourcing
>>> tuples from a rabbitmq cluster. I want to distribute the spout tasks across
>>> different storm workers so that the throughput is higher and the load of
>>> ingesting the messages is distributed across all the workers in the storm
>>> cluster. Any suggestions on how to influence the distribution of tasks?
>>> Thanks,
>>> Srinath.

Reply via email to