@Matthias: I agree about the batch processor, but my superior took the
decision to use Storm, and he visualizes more complexity later for which he
needs Storm.
I had considered the "end of stream" tuple earlier (my idea was to emit 10
consecutive nulls), but then the question was how do I know how many bolt
instances have been created, and how do I notify all the bolts? Because
it's only after the last bolt finishes writing to DB, that I have to shut
down the topology.

@Jason: Thanks. I had seen storm signals earlier (I think from one of your
replies to someone else) and I had a look at the code too, but am a bit
wary because it's no longer being maintained and because of the issues:
https://github.com/ptgoetz/storm-signals/issues

On Sun, May 8, 2016 at 5:40 AM, Jason Kusar <ja...@kusar.net> wrote:

> You might want to check out Storm Signals.
> https://github.com/ptgoetz/storm-signals
>
> It might give you what you're looking for.
>
> On Sat, May 7, 2016, 11:59 AM Matthias J. Sax <mj...@apache.org> wrote:
>
>> As you mentioned already: Storm is designed to run topologies forever ;)
>> If you have finite data, why do you not use a batch processor???
>>
>> As a workaround, you can embed "control messages" in your stream (or use
>> an additional stream for them).
>>
>> If you want a topology to shut down itself, you could use
>> `NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);`
>> in your spout/bolt code.
>>
>> Something like:
>>  - Spout emit all tuples
>>  - Spout emit special "end of stream" control tuple
>>  - Bolt1 processes everything
>>  - Bolt1 forward "end of stream" control tuple (when it received it)
>>  - Bolt2 processed everything
>>  - Bolt2 receives "end of stream" control tuple => flush to DB => kill
>> topology
>>
>> But I guess, this is kinda weird pattern.
>>
>> -Matthias
>>
>> On 05/05/2016 06:13 AM, Navin Ipe wrote:
>> > Hi,
>> >
>> > I know Storm is designed to run forever. I also know about Trident's
>> > technique of aggregation. But shouldn't Storm have a way to let bolts
>> > know that a certain bunch of processing has been completed?
>> >
>> > Consider this topology:
>> > Spout------>Bolt-A------>Bolt-B
>> >             |                  |--->Bolt-B
>> >             |                  \--->Bolt-B
>> >             |--->Bolt-A------>Bolt-B
>> >             |                  |--->Bolt-B
>> >             |                  \--->Bolt-B
>> >             \--->Bolt-A------>Bolt-B
>> >                                |--->Bolt-B
>> >                                \--->Bolt-B
>> >
>> >   * From Bolt-A to Bolt-B, it is a FieldsGrouping.
>> >   * Spout emits only a few tuples and then stops emitting.
>> >   * Bolt A takes those tuples and generates millions of tuples.
>> >
>> >
>> > *Bolt-B accumulates tuples that Bolt A sends and needs to know when
>> > Spout finished emitting. Only then can Bolt-B start writing to SQL.*
>> >
>> > *Questions:*
>> > 1. How can all Bolts B be notified that it is time to write to SQL?
>> > 2. After all Bolts B have written to SQL, how to know that all Bolts B
>> > have completed writing?
>> > 3. How to stop the topology? I know of
>> > localCluster.killTopology("HelloStorm"), but shouldn't there be a way to
>> > do it from the Bolt?
>> >
>> > --
>> > Regards,
>> > Navin
>>
>>


-- 
Regards,
Navin

Reply via email to