hi!
  there is this solution of sending a poison pill message from the spout.
on bolt wil receiv your poison pill and will kill topology via storm storm
nimbus API. one potentential issue whith this approach is that due to your
topology structure regarding the parralelism of your bolts nd the time
required by themto excute their bussineess logic, is that the poison pill
to be swallowed by the one bolt responsilble for killing the topology,
before all the other messages that are in-flight to be processed. the
conseuence is that you cannot be sure that all the messagess sent by the
spout were processed. also sharing the total number of sent messages
between the excutors in order to shutdown when all messages were processed
coul be error prone since  tuple can be processed many times (depending on
your guaranteee message processing) or they could be failed.
  i coul not find  a solution for this. storm is intended to run
forunbounded data.
i hope that thrse help,
regard,
florin


On Sunday, May 8, 2016, Matthias J. Sax <mj...@apache.org> wrote:

> You can get the number of bolt instances from TopologyContext that is
> provided in Bolt.prepare()
>
> Furthermore, you could put a loop into your topology, ie, a bolt reads
> it's own output; if you broadcast (ie, allGrouping) this
> feedback-loop-stream you can let bolt instances talk to each other.
>
> builder.setBolt("DbBolt", new MyDBBolt())
>        .shuffleGrouping("spout")
>        .allGrouping("flush-stream", "DbBolt");
>
> where "flush-stream" is a second output stream of MyDBBolt() sending a
> notification tuple after it received the end-of-stream from spout;
> furthermore, if a bolt received the signal via "flush-stream" from
> **all** parallel bolt instances, it can flush to DB.
>
> Or something like this... Be creative! :)
>
>
> -Matthias
>
>
> On 05/08/2016 02:26 PM, Navin Ipe wrote:
> > @Matthias: I agree about the batch processor, but my superior took the
> > decision to use Storm, and he visualizes more complexity later for which
> > he needs Storm.
> > I had considered the "end of stream" tuple earlier (my idea was to emit
> > 10 consecutive nulls), but then the question was how do I know how many
> > bolt instances have been created, and how do I notify all the bolts?
> > Because it's only after the last bolt finishes writing to DB, that I
> > have to shut down the topology.
> >
> > @Jason: Thanks. I had seen storm signals earlier (I think from one of
> > your replies to someone else) and I had a look at the code too, but am a
> > bit wary because it's no longer being maintained and because of the
> > issues: https://github.com/ptgoetz/storm-signals/issues
> >
> > On Sun, May 8, 2016 at 5:40 AM, Jason Kusar <ja...@kusar.net
> <javascript:;>
> > <mailto:ja...@kusar.net <javascript:;>>> wrote:
> >
> >     You might want to check out Storm Signals.
> >     https://github.com/ptgoetz/storm-signals
> >
> >     It might give you what you're looking for.
> >
> >
> >     On Sat, May 7, 2016, 11:59 AM Matthias J. Sax <mj...@apache.org
> <javascript:;>
> >     <mailto:mj...@apache.org <javascript:;>>> wrote:
> >
> >         As you mentioned already: Storm is designed to run topologies
> >         forever ;)
> >         If you have finite data, why do you not use a batch processor???
> >
> >         As a workaround, you can embed "control messages" in your stream
> >         (or use
> >         an additional stream for them).
> >
> >         If you want a topology to shut down itself, you could use
> >
>  `NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);`
> >         in your spout/bolt code.
> >
> >         Something like:
> >          - Spout emit all tuples
> >          - Spout emit special "end of stream" control tuple
> >          - Bolt1 processes everything
> >          - Bolt1 forward "end of stream" control tuple (when it received
> it)
> >          - Bolt2 processed everything
> >          - Bolt2 receives "end of stream" control tuple => flush to DB
> >         => kill
> >         topology
> >
> >         But I guess, this is kinda weird pattern.
> >
> >         -Matthias
> >
> >         On 05/05/2016 06:13 AM, Navin Ipe wrote:
> >         > Hi,
> >         >
> >         > I know Storm is designed to run forever. I also know about
> >         Trident's
> >         > technique of aggregation. But shouldn't Storm have a way to
> >         let bolts
> >         > know that a certain bunch of processing has been completed?
> >         >
> >         > Consider this topology:
> >         > Spout------>Bolt-A------>Bolt-B
> >         >             |                  |--->Bolt-B
> >         >             |                  \--->Bolt-B
> >         >             |--->Bolt-A------>Bolt-B
> >         >             |                  |--->Bolt-B
> >         >             |                  \--->Bolt-B
> >         >             \--->Bolt-A------>Bolt-B
> >         >                                |--->Bolt-B
> >         >                                \--->Bolt-B
> >         >
> >         >   * From Bolt-A to Bolt-B, it is a FieldsGrouping.
> >         >   * Spout emits only a few tuples and then stops emitting.
> >         >   * Bolt A takes those tuples and generates millions of tuples.
> >         >
> >         >
> >         > *Bolt-B accumulates tuples that Bolt A sends and needs to know
> >         when
> >         > Spout finished emitting. Only then can Bolt-B start writing to
> >         SQL.*
> >         >
> >         > *Questions:*
> >         > 1. How can all Bolts B be notified that it is time to write to
> >         SQL?
> >         > 2. After all Bolts B have written to SQL, how to know that all
> >         Bolts B
> >         > have completed writing?
> >         > 3. How to stop the topology? I know of
> >         > localCluster.killTopology("HelloStorm"), but shouldn't there
> >         be a way to
> >         > do it from the Bolt?
> >         >
> >         > --
> >         > Regards,
> >         > Navin
> >
> >
> >
> >
> > --
> > Regards,
> > Navin
>
>

Reply via email to