Hey Stephen, Michael, Yea I feared as much... as searching the docs and API did not surface any reliable and elegant way of doing that unless you had a "RouterBolt". If setting the parallelism of a component is enough for load balancing the processes across different machines that are part of the Storm cluster then this would suffice in my use case. Although here <https://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html> the documentation says executors are threads and it does not explicitly say anywhere that threads are spawned across different nodes of the cluster... I want to avoid the possibility of these threads only spawning locally and not in a distributed fashion among the cluster nodes..
Andrew. On Thu, Jul 17, 2014 at 2:46 AM, Michael Rose <mich...@fullcontact.com> wrote: > Maybe we can help with your topology design if you let us know what you're > doing that requires you to shuffle half of the whole stream output to each > of the two different types of bolts. > > If bolt b1 and bolt b2 are both instances of ExampleBolt (and not two > different types) as above, there's no point to doing this. Setting the > parallelism will make sure that data is partitioned across machines (by > default, setting parallelism sets tasks = executors = parallelism). > > Unfortunately, I don't know of any way to do this other than shuffling the > output to a new bolt, e.g. bolt "b0" a 'RouterBolt', then having bolt b0 > round-robin the received tuples between two streams, then have b1 and b2 > shuffle over those streams instead. > > > > Michael Rose (@Xorlev <https://twitter.com/xorlev>) > Senior Platform Engineer, FullContact <http://www.fullcontact.com/> > mich...@fullcontact.com > > > On Wed, Jul 16, 2014 at 5:40 PM, Andrew Xor <andreas.gramme...@gmail.com> > wrote: > >> >> Hi Tomas, >> >> As I said in my previous mail the grouping is for a bolt *task* not for >> the actual number of spawned bolts; for example let's say you have two >> bolts that have a parallelism hint of 3 and these two bolts are wired to >> the same spout. If you set the bolts as such: >> >> tb.setBolt("b1", new ExampleBolt(), 2 /* p-hint >> */).shuffleGrouping("spout1"); >> tb.setBolt("b2", new ExampleBolt(), 2 /* p-hint >> */).shuffleGrouping("spout1"); >> >> Then each of the tasks will receive half of the spout tuples but each >> actual spawned bolt will receive all of the tuples emitted from the spout. >> This is more evident if you set up a counter in the bolt counting how many >> tuples if has received and testing this with no parallelism hint as such: >> >> tb.setBolt("b1", new ExampleBolt(),).shuffleGrouping("spout1"); >> tb.setBolt("b2", new ExampleBolt()).shuffleGrouping("spout1"); >> >> Now you will see that both bolts will receive all tuples emitted by >> spout1. >> >> Hope this helps. >> >> >> Andrew. >> >> >> On Thu, Jul 17, 2014 at 2:33 AM, Tomas Mazukna <tomas.mazu...@gmail.com> >> wrote: >> >>> Andrew, >>> >>> when you connect your bolt to your spout you specify the grouping. If >>> you use shuffle grouping then any free bolt gets the tuple - in my >>> experience even in lightly loaded topologies the distribution amongst bolts >>> is pretty even. If you use all grouping then all bolts receive a copy of >>> the tuple. >>> Use shuffle grouping and each of your bolts will get about 1/3 of the >>> workload. >>> >>> Tomas >>> >>> >>> On Wed, Jul 16, 2014 at 7:05 PM, Andrew Xor <andreas.gramme...@gmail.com >>> > wrote: >>> >>>> H >>>> i, >>>> >>>> I am trying to distribute the spout output to it's subscribed bolts >>>> evenly; let's say that I have a spout that emits tuples and three bolts >>>> that are subscribed to it. I want each of the three bolts to receive 1/3 >>>> rth of the output (or emit a tuple to each one of these bolts in turns). >>>> Unfortunately as far as I understand all bolts will receive all of the >>>> emitted tuples of that particular spout regardless of the grouping defined >>>> (as grouping from my understanding is for bolt *tasks* not actual bolts). >>>> >>>> I've searched a bit and I can't seem to find a way to accomplish >>>> that... is there a way to do that or I am searching in vain? >>>> >>>> Thanks. >>>> >>> >>> >>> >>> -- >>> Tomas Mazukna >>> 678-557-3834 >>> >> >> >