Re: How to let a topology know that it's time to stop?

Nathan Leung Sun, 08 May 2016 15:12:02 -0700

Alternative is to use a control message on a separate stream that goes to
all bolt tasks using all grouping.
On May 8, 2016 3:20 PM, "Matthias J. Sax" <mj...@apache.org> wrote:


> To synchronize this, use an additional "shut down bolt" that used
> parallelism of one. "shut down bolt" must be notified by all parallel
> DbBolts after they performed the flush. If all notifications are
> received, there are not in-flight message and thus "shut down bolt" can
> kill the topology safely.
>
> -Matthias
>
>
>
> On 05/08/2016 07:27 PM, Spico Florin wrote:
> > hi!
> >   there is this solution of sending a poison pill message from the
> > spout. on bolt wil receiv your poison pill and will kill topology via
> > storm storm nimbus API. one potentential issue whith this approach is
> > that due to your topology structure regarding the parralelism of your
> > bolts nd the time required by themto excute their bussineess logic, is
> > that the poison pill to be swallowed by the one bolt responsilble for
> > killing the topology, before all the other messages that are in-flight
> > to be processed. the conseuence is that you cannot be sure that all the
> > messagess sent by the spout were processed. also sharing the total
> > number of sent messages between the excutors in order to shutdown when
> > all messages were processed coul be error prone since  tuple can be
> > processed many times (depending on your guaranteee message processing)
> > or they could be failed.
> >   i coul not find  a solution for this. storm is intended to run
> > forunbounded data.
> > i hope that thrse help,
> > regard,
> > florin
> >
> >
> > On Sunday, May 8, 2016, Matthias J. Sax <mj...@apache.org
> > <mailto:mj...@apache.org>> wrote:
> >
> >     You can get the number of bolt instances from TopologyContext that is
> >     provided in Bolt.prepare()
> >
> >     Furthermore, you could put a loop into your topology, ie, a bolt
> reads
> >     it's own output; if you broadcast (ie, allGrouping) this
> >     feedback-loop-stream you can let bolt instances talk to each other.
> >
> >     builder.setBolt("DbBolt", new MyDBBolt())
> >            .shuffleGrouping("spout")
> >            .allGrouping("flush-stream", "DbBolt");
> >
> >     where "flush-stream" is a second output stream of MyDBBolt() sending
> a
> >     notification tuple after it received the end-of-stream from spout;
> >     furthermore, if a bolt received the signal via "flush-stream" from
> >     **all** parallel bolt instances, it can flush to DB.
> >
> >     Or something like this... Be creative! :)
> >
> >
> >     -Matthias
> >
> >
> >     On 05/08/2016 02:26 PM, Navin Ipe wrote:
> >     > @Matthias: I agree about the batch processor, but my superior took
> the
> >     > decision to use Storm, and he visualizes more complexity later for
> >     which
> >     > he needs Storm.
> >     > I had considered the "end of stream" tuple earlier (my idea was to
> >     emit
> >     > 10 consecutive nulls), but then the question was how do I know how
> >     many
> >     > bolt instances have been created, and how do I notify all the
> bolts?
> >     > Because it's only after the last bolt finishes writing to DB, that
> I
> >     > have to shut down the topology.
> >     >
> >     > @Jason: Thanks. I had seen storm signals earlier (I think from one
> of
> >     > your replies to someone else) and I had a look at the code too,
> >     but am a
> >     > bit wary because it's no longer being maintained and because of the
> >     > issues: https://github.com/ptgoetz/storm-signals/issues
> >     >
> >     > On Sun, May 8, 2016 at 5:40 AM, Jason Kusar <ja...@kusar.net
> >     <javascript:;>
> >     > <mailto:ja...@kusar.net <javascript:;>>> wrote:
> >     >
> >     >     You might want to check out Storm Signals.
> >     >     https://github.com/ptgoetz/storm-signals
> >     >
> >     >     It might give you what you're looking for.
> >     >
> >     >
> >     >     On Sat, May 7, 2016, 11:59 AM Matthias J. Sax
> >     <mj...@apache.org <javascript:;>
> >     >     <mailto:mj...@apache.org <javascript:;>>> wrote:
> >     >
> >     >         As you mentioned already: Storm is designed to run
> topologies
> >     >         forever ;)
> >     >         If you have finite data, why do you not use a batch
> >     processor???
> >     >
> >     >         As a workaround, you can embed "control messages" in your
> >     stream
> >     >         (or use
> >     >         an additional stream for them).
> >     >
> >     >         If you want a topology to shut down itself, you could use
> >     >
> >
> `NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);`
> >     >         in your spout/bolt code.
> >     >
> >     >         Something like:
> >     >          - Spout emit all tuples
> >     >          - Spout emit special "end of stream" control tuple
> >     >          - Bolt1 processes everything
> >     >          - Bolt1 forward "end of stream" control tuple (when it
> >     received it)
> >     >          - Bolt2 processed everything
> >     >          - Bolt2 receives "end of stream" control tuple => flush
> to DB
> >     >         => kill
> >     >         topology
> >     >
> >     >         But I guess, this is kinda weird pattern.
> >     >
> >     >         -Matthias
> >     >
> >     >         On 05/05/2016 06:13 AM, Navin Ipe wrote:
> >     >         > Hi,
> >     >         >
> >     >         > I know Storm is designed to run forever. I also know
> about
> >     >         Trident's
> >     >         > technique of aggregation. But shouldn't Storm have a way
> to
> >     >         let bolts
> >     >         > know that a certain bunch of processing has been
> completed?
> >     >         >
> >     >         > Consider this topology:
> >     >         > Spout------>Bolt-A------>Bolt-B
> >     >         >             |                  |--->Bolt-B
> >     >         >             |                  \--->Bolt-B
> >     >         >             |--->Bolt-A------>Bolt-B
> >     >         >             |                  |--->Bolt-B
> >     >         >             |                  \--->Bolt-B
> >     >         >             \--->Bolt-A------>Bolt-B
> >     >         >                                |--->Bolt-B
> >     >         >                                \--->Bolt-B
> >     >         >
> >     >         >   * From Bolt-A to Bolt-B, it is a FieldsGrouping.
> >     >         >   * Spout emits only a few tuples and then stops
> emitting.
> >     >         >   * Bolt A takes those tuples and generates millions of
> >     tuples.
> >     >         >
> >     >         >
> >     >         > *Bolt-B accumulates tuples that Bolt A sends and needs
> >     to know
> >     >         when
> >     >         > Spout finished emitting. Only then can Bolt-B start
> >     writing to
> >     >         SQL.*
> >     >         >
> >     >         > *Questions:*
> >     >         > 1. How can all Bolts B be notified that it is time to
> >     write to
> >     >         SQL?
> >     >         > 2. After all Bolts B have written to SQL, how to know
> >     that all
> >     >         Bolts B
> >     >         > have completed writing?
> >     >         > 3. How to stop the topology? I know of
> >     >         > localCluster.killTopology("HelloStorm"), but shouldn't
> there
> >     >         be a way to
> >     >         > do it from the Bolt?
> >     >         >
> >     >         > --
> >     >         > Regards,
> >     >         > Navin
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > Regards,
> >     > Navin
> >
>
>

Re: How to let a topology know that it's time to stop?

Reply via email to