Re: How to let a topology know that it's time to stop?

Spico Florin Mon, 09 May 2016 02:45:57 -0700

Hi!
 You welcome Navine. I'm also interested in the solution. Can you please
share your remarks and (some code :)) after the implementation?
Thanks.
Regards,\
 Florin


On Mon, May 9, 2016 at 7:20 AM, Navin Ipe <navin....@searchlighthealth.com>
wrote:

> @Matthias: That's genius! I didn't know streams and allGroupings could be
> used like that.
> In the way Storm introduced tick tuples, it'd have been nice if Storm had
> a native technique of doing all this, but the ideas you've come up with are
> extremely good. Am going to try implementing them right away.
> Thank you too Florin!
>
> On Mon, May 9, 2016 at 12:48 AM, Matthias J. Sax <mj...@apache.org> wrote:
>
>> To synchronize this, use an additional "shut down bolt" that used
>> parallelism of one. "shut down bolt" must be notified by all parallel
>> DbBolts after they performed the flush. If all notifications are
>> received, there are not in-flight message and thus "shut down bolt" can
>> kill the topology safely.
>>
>> -Matthias
>>
>>
>>
>> On 05/08/2016 07:27 PM, Spico Florin wrote:
>> > hi!
>> >   there is this solution of sending a poison pill message from the
>> > spout. on bolt wil receiv your poison pill and will kill topology via
>> > storm storm nimbus API. one potentential issue whith this approach is
>> > that due to your topology structure regarding the parralelism of your
>> > bolts nd the time required by themto excute their bussineess logic, is
>> > that the poison pill to be swallowed by the one bolt responsilble for
>> > killing the topology, before all the other messages that are in-flight
>> > to be processed. the conseuence is that you cannot be sure that all the
>> > messagess sent by the spout were processed. also sharing the total
>> > number of sent messages between the excutors in order to shutdown when
>> > all messages were processed coul be error prone since  tuple can be
>> > processed many times (depending on your guaranteee message processing)
>> > or they could be failed.
>> >   i coul not find  a solution for this. storm is intended to run
>> > forunbounded data.
>> > i hope that thrse help,
>> > regard,
>> > florin
>> >
>> >
>> > On Sunday, May 8, 2016, Matthias J. Sax <mj...@apache.org
>> > <mailto:mj...@apache.org>> wrote:
>> >
>> >     You can get the number of bolt instances from TopologyContext that
>> is
>> >     provided in Bolt.prepare()
>> >
>> >     Furthermore, you could put a loop into your topology, ie, a bolt
>> reads
>> >     it's own output; if you broadcast (ie, allGrouping) this
>> >     feedback-loop-stream you can let bolt instances talk to each other.
>> >
>> >     builder.setBolt("DbBolt", new MyDBBolt())
>> >            .shuffleGrouping("spout")
>> >            .allGrouping("flush-stream", "DbBolt");
>> >
>> >     where "flush-stream" is a second output stream of MyDBBolt()
>> sending a
>> >     notification tuple after it received the end-of-stream from spout;
>> >     furthermore, if a bolt received the signal via "flush-stream" from
>> >     **all** parallel bolt instances, it can flush to DB.
>> >
>> >     Or something like this... Be creative! :)
>> >
>> >
>> >     -Matthias
>> >
>> >
>> >     On 05/08/2016 02:26 PM, Navin Ipe wrote:
>> >     > @Matthias: I agree about the batch processor, but my superior
>> took the
>> >     > decision to use Storm, and he visualizes more complexity later for
>> >     which
>> >     > he needs Storm.
>> >     > I had considered the "end of stream" tuple earlier (my idea was to
>> >     emit
>> >     > 10 consecutive nulls), but then the question was how do I know how
>> >     many
>> >     > bolt instances have been created, and how do I notify all the
>> bolts?
>> >     > Because it's only after the last bolt finishes writing to DB,
>> that I
>> >     > have to shut down the topology.
>> >     >
>> >     > @Jason: Thanks. I had seen storm signals earlier (I think from
>> one of
>> >     > your replies to someone else) and I had a look at the code too,
>> >     but am a
>> >     > bit wary because it's no longer being maintained and because of
>> the
>> >     > issues: https://github.com/ptgoetz/storm-signals/issues
>> >     >
>> >     > On Sun, May 8, 2016 at 5:40 AM, Jason Kusar <ja...@kusar.net
>> >     <javascript:;>
>> >     > <mailto:ja...@kusar.net <javascript:;>>> wrote:
>> >     >
>> >     >     You might want to check out Storm Signals.
>> >     >     https://github.com/ptgoetz/storm-signals
>> >     >
>> >     >     It might give you what you're looking for.
>> >     >
>> >     >
>> >     >     On Sat, May 7, 2016, 11:59 AM Matthias J. Sax
>> >     <mj...@apache.org <javascript:;>
>> >     >     <mailto:mj...@apache.org <javascript:;>>> wrote:
>> >     >
>> >     >         As you mentioned already: Storm is designed to run
>> topologies
>> >     >         forever ;)
>> >     >         If you have finite data, why do you not use a batch
>> >     processor???
>> >     >
>> >     >         As a workaround, you can embed "control messages" in your
>> >     stream
>> >     >         (or use
>> >     >         an additional stream for them).
>> >     >
>> >     >         If you want a topology to shut down itself, you could use
>> >     >
>> >
>> `NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);`
>> >     >         in your spout/bolt code.
>> >     >
>> >     >         Something like:
>> >     >          - Spout emit all tuples
>> >     >          - Spout emit special "end of stream" control tuple
>> >     >          - Bolt1 processes everything
>> >     >          - Bolt1 forward "end of stream" control tuple (when it
>> >     received it)
>> >     >          - Bolt2 processed everything
>> >     >          - Bolt2 receives "end of stream" control tuple => flush
>> to DB
>> >     >         => kill
>> >     >         topology
>> >     >
>> >     >         But I guess, this is kinda weird pattern.
>> >     >
>> >     >         -Matthias
>> >     >
>> >     >         On 05/05/2016 06:13 AM, Navin Ipe wrote:
>> >     >         > Hi,
>> >     >         >
>> >     >         > I know Storm is designed to run forever. I also know
>> about
>> >     >         Trident's
>> >     >         > technique of aggregation. But shouldn't Storm have a
>> way to
>> >     >         let bolts
>> >     >         > know that a certain bunch of processing has been
>> completed?
>> >     >         >
>> >     >         > Consider this topology:
>> >     >         > Spout------>Bolt-A------>Bolt-B
>> >     >         >             |                  |--->Bolt-B
>> >     >         >             |                  \--->Bolt-B
>> >     >         >             |--->Bolt-A------>Bolt-B
>> >     >         >             |                  |--->Bolt-B
>> >     >         >             |                  \--->Bolt-B
>> >     >         >             \--->Bolt-A------>Bolt-B
>> >     >         >                                |--->Bolt-B
>> >     >         >                                \--->Bolt-B
>> >     >         >
>> >     >         >   * From Bolt-A to Bolt-B, it is a FieldsGrouping.
>> >     >         >   * Spout emits only a few tuples and then stops
>> emitting.
>> >     >         >   * Bolt A takes those tuples and generates millions of
>> >     tuples.
>> >     >         >
>> >     >         >
>> >     >         > *Bolt-B accumulates tuples that Bolt A sends and needs
>> >     to know
>> >     >         when
>> >     >         > Spout finished emitting. Only then can Bolt-B start
>> >     writing to
>> >     >         SQL.*
>> >     >         >
>> >     >         > *Questions:*
>> >     >         > 1. How can all Bolts B be notified that it is time to
>> >     write to
>> >     >         SQL?
>> >     >         > 2. After all Bolts B have written to SQL, how to know
>> >     that all
>> >     >         Bolts B
>> >     >         > have completed writing?
>> >     >         > 3. How to stop the topology? I know of
>> >     >         > localCluster.killTopology("HelloStorm"), but shouldn't
>> there
>> >     >         be a way to
>> >     >         > do it from the Bolt?
>> >     >         >
>> >     >         > --
>> >     >         > Regards,
>> >     >         > Navin
>> >     >
>> >     >
>> >     >
>> >     >
>> >     > --
>> >     > Regards,
>> >     > Navin
>> >
>>
>>
>
>
> --
> Regards,
> Navin
>

Re: How to let a topology know that it's time to stop?

Reply via email to