@Matthias: I agree about the batch processor, but my superior took the decision to use Storm, and he visualizes more complexity later for which he needs Storm. I had considered the "end of stream" tuple earlier (my idea was to emit 10 consecutive nulls), but then the question was how do I know how many bolt instances have been created, and how do I notify all the bolts? Because it's only after the last bolt finishes writing to DB, that I have to shut down the topology.
@Jason: Thanks. I had seen storm signals earlier (I think from one of your replies to someone else) and I had a look at the code too, but am a bit wary because it's no longer being maintained and because of the issues: https://github.com/ptgoetz/storm-signals/issues On Sun, May 8, 2016 at 5:40 AM, Jason Kusar <ja...@kusar.net> wrote: > You might want to check out Storm Signals. > https://github.com/ptgoetz/storm-signals > > It might give you what you're looking for. > > On Sat, May 7, 2016, 11:59 AM Matthias J. Sax <mj...@apache.org> wrote: > >> As you mentioned already: Storm is designed to run topologies forever ;) >> If you have finite data, why do you not use a batch processor??? >> >> As a workaround, you can embed "control messages" in your stream (or use >> an additional stream for them). >> >> If you want a topology to shut down itself, you could use >> `NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);` >> in your spout/bolt code. >> >> Something like: >> - Spout emit all tuples >> - Spout emit special "end of stream" control tuple >> - Bolt1 processes everything >> - Bolt1 forward "end of stream" control tuple (when it received it) >> - Bolt2 processed everything >> - Bolt2 receives "end of stream" control tuple => flush to DB => kill >> topology >> >> But I guess, this is kinda weird pattern. >> >> -Matthias >> >> On 05/05/2016 06:13 AM, Navin Ipe wrote: >> > Hi, >> > >> > I know Storm is designed to run forever. I also know about Trident's >> > technique of aggregation. But shouldn't Storm have a way to let bolts >> > know that a certain bunch of processing has been completed? >> > >> > Consider this topology: >> > Spout------>Bolt-A------>Bolt-B >> > | |--->Bolt-B >> > | \--->Bolt-B >> > |--->Bolt-A------>Bolt-B >> > | |--->Bolt-B >> > | \--->Bolt-B >> > \--->Bolt-A------>Bolt-B >> > |--->Bolt-B >> > \--->Bolt-B >> > >> > * From Bolt-A to Bolt-B, it is a FieldsGrouping. >> > * Spout emits only a few tuples and then stops emitting. >> > * Bolt A takes those tuples and generates millions of tuples. >> > >> > >> > *Bolt-B accumulates tuples that Bolt A sends and needs to know when >> > Spout finished emitting. Only then can Bolt-B start writing to SQL.* >> > >> > *Questions:* >> > 1. How can all Bolts B be notified that it is time to write to SQL? >> > 2. After all Bolts B have written to SQL, how to know that all Bolts B >> > have completed writing? >> > 3. How to stop the topology? I know of >> > localCluster.killTopology("HelloStorm"), but shouldn't there be a way to >> > do it from the Bolt? >> > >> > -- >> > Regards, >> > Navin >> >> -- Regards, Navin