As you mentioned already: Storm is designed to run topologies forever ;) If you have finite data, why do you not use a batch processor???
As a workaround, you can embed "control messages" in your stream (or use an additional stream for them). If you want a topology to shut down itself, you could use `NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);` in your spout/bolt code. Something like: - Spout emit all tuples - Spout emit special "end of stream" control tuple - Bolt1 processes everything - Bolt1 forward "end of stream" control tuple (when it received it) - Bolt2 processed everything - Bolt2 receives "end of stream" control tuple => flush to DB => kill topology But I guess, this is kinda weird pattern. -Matthias On 05/05/2016 06:13 AM, Navin Ipe wrote: > Hi, > > I know Storm is designed to run forever. I also know about Trident's > technique of aggregation. But shouldn't Storm have a way to let bolts > know that a certain bunch of processing has been completed? > > Consider this topology: > Spout------>Bolt-A------>Bolt-B > | |--->Bolt-B > | \--->Bolt-B > |--->Bolt-A------>Bolt-B > | |--->Bolt-B > | \--->Bolt-B > \--->Bolt-A------>Bolt-B > |--->Bolt-B > \--->Bolt-B > > * From Bolt-A to Bolt-B, it is a FieldsGrouping. > * Spout emits only a few tuples and then stops emitting. > * Bolt A takes those tuples and generates millions of tuples. > > > *Bolt-B accumulates tuples that Bolt A sends and needs to know when > Spout finished emitting. Only then can Bolt-B start writing to SQL.* > > *Questions:* > 1. How can all Bolts B be notified that it is time to write to SQL? > 2. After all Bolts B have written to SQL, how to know that all Bolts B > have completed writing? > 3. How to stop the topology? I know of > localCluster.killTopology("HelloStorm"), but shouldn't there be a way to > do it from the Bolt? > > -- > Regards, > Navin
signature.asc
Description: OpenPGP digital signature