Re: Distribute Spout output among all bolts

Andrew Xor Wed, 16 Jul 2014 16:54:25 -0700

Hey Stephen, Michael,

 Yea I feared as much... as searching the docs and API did not surface any
reliable and elegant way of doing that unless you had a "RouterBolt". If
setting the parallelism of a component is enough for load balancing the
processes across different machines that are part of the Storm cluster then
this would suffice in my use case. Although here
<https://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html>
the documentation says executors are threads and it does not explicitly say
anywhere that threads are spawned across different nodes of the cluster...
I want to avoid the possibility of these threads only spawning locally and
not in a distributed fashion among the cluster nodes..


Andrew.


On Thu, Jul 17, 2014 at 2:46 AM, Michael Rose <mich...@fullcontact.com>
wrote:

> Maybe we can help with your topology design if you let us know what you're
> doing that requires you to shuffle half of the whole stream output to each
> of the two different types of bolts.
>
> If bolt b1 and bolt b2 are both instances of ExampleBolt (and not two
> different types) as above, there's no point to doing this. Setting the
> parallelism will make sure that data is partitioned across machines (by
> default, setting parallelism sets tasks = executors = parallelism).
>
> Unfortunately, I don't know of any way to do this other than shuffling the
> output to a new bolt, e.g. bolt "b0" a 'RouterBolt', then having bolt b0
> round-robin the received tuples between two streams, then have b1 and b2
> shuffle over those streams instead.
>
>
>
> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
> mich...@fullcontact.com
>
>
> On Wed, Jul 16, 2014 at 5:40 PM, Andrew Xor <andreas.gramme...@gmail.com>
> wrote:
>
>> 
>> Hi Tomas,
>>
>>  As I said in my previous mail the grouping is for a bolt *task* not for
>> the actual number of spawned bolts; for example let's say you have two
>> bolts that have a parallelism hint of 3 and these two bolts are wired to
>> the same spout. If you set the bolts as such:
>>
>> tb.setBolt("b1", new ExampleBolt(), 2 /* p-hint
>> */).shuffleGrouping("spout1");
>> tb.setBolt("b2", new ExampleBolt(), 2 /* p-hint
>> */).shuffleGrouping("spout1");
>>
>> Then each of the tasks will receive half of the spout tuples but each
>> actual spawned bolt will receive all of the tuples emitted from the spout.
>> This is more evident if you set up a counter in the bolt counting how many
>> tuples if has received and testing this with no parallelism hint as such:
>>
>> tb.setBolt("b1", new ExampleBolt(),).shuffleGrouping("spout1");
>> tb.setBolt("b2", new ExampleBolt()).shuffleGrouping("spout1");
>>
>> Now you will see that both bolts will receive all tuples emitted by
>> spout1.
>>
>> Hope this helps.
>>
>> 
>> Andrew.
>>
>>
>> On Thu, Jul 17, 2014 at 2:33 AM, Tomas Mazukna <tomas.mazu...@gmail.com>
>> wrote:
>>
>>> Andrew,
>>>
>>> when you connect your bolt to your spout you specify the grouping. If
>>> you use shuffle grouping then any free bolt gets the tuple - in my
>>> experience even in lightly loaded topologies the distribution amongst bolts
>>> is pretty even. If you use all grouping then all bolts receive a copy of
>>> the tuple.
>>> Use shuffle grouping and each of your bolts will get about 1/3 of the
>>> workload.
>>>
>>> Tomas
>>>
>>>
>>> On Wed, Jul 16, 2014 at 7:05 PM, Andrew Xor <andreas.gramme...@gmail.com
>>> > wrote:
>>>
>>>> H
>>>> i,
>>>>
>>>>  I am trying to distribute the spout output to it's subscribed bolts
>>>> evenly; let's say that I have a spout that emits tuples and three bolts
>>>> that are subscribed to it. I want each of the three bolts to receive 1/3
>>>> rth of the output (or emit a tuple to each one of these bolts in turns).
>>>> Unfortunately as far as I understand all bolts will receive all of the
>>>> emitted tuples of that particular spout regardless of the grouping defined
>>>> (as grouping from my understanding is for bolt *tasks* not actual bolts).
>>>>
>>>>  I've searched a bit and I can't seem to find a way to accomplish
>>>> that... is there a way to do that or I am searching in vain?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>> --
>>> Tomas Mazukna
>>> 678-557-3834
>>>
>>
>>
>

Re: Distribute Spout output among all bolts

Reply via email to