@Matthias: That's genius! I didn't know streams and allGroupings could be
used like that.
In the way Storm introduced tick tuples, it'd have been nice if Storm had a
native technique of doing all this, but the ideas you've come up with are
extremely good. Am going to try implementing them right away.
Thank you too Florin!

On Mon, May 9, 2016 at 12:48 AM, Matthias J. Sax <mj...@apache.org> wrote:

> To synchronize this, use an additional "shut down bolt" that used
> parallelism of one. "shut down bolt" must be notified by all parallel
> DbBolts after they performed the flush. If all notifications are
> received, there are not in-flight message and thus "shut down bolt" can
> kill the topology safely.
>
> -Matthias
>
>
>
> On 05/08/2016 07:27 PM, Spico Florin wrote:
> > hi!
> >   there is this solution of sending a poison pill message from the
> > spout. on bolt wil receiv your poison pill and will kill topology via
> > storm storm nimbus API. one potentential issue whith this approach is
> > that due to your topology structure regarding the parralelism of your
> > bolts nd the time required by themto excute their bussineess logic, is
> > that the poison pill to be swallowed by the one bolt responsilble for
> > killing the topology, before all the other messages that are in-flight
> > to be processed. the conseuence is that you cannot be sure that all the
> > messagess sent by the spout were processed. also sharing the total
> > number of sent messages between the excutors in order to shutdown when
> > all messages were processed coul be error prone since  tuple can be
> > processed many times (depending on your guaranteee message processing)
> > or they could be failed.
> >   i coul not find  a solution for this. storm is intended to run
> > forunbounded data.
> > i hope that thrse help,
> > regard,
> > florin
> >
> >
> > On Sunday, May 8, 2016, Matthias J. Sax <mj...@apache.org
> > <mailto:mj...@apache.org>> wrote:
> >
> >     You can get the number of bolt instances from TopologyContext that is
> >     provided in Bolt.prepare()
> >
> >     Furthermore, you could put a loop into your topology, ie, a bolt
> reads
> >     it's own output; if you broadcast (ie, allGrouping) this
> >     feedback-loop-stream you can let bolt instances talk to each other.
> >
> >     builder.setBolt("DbBolt", new MyDBBolt())
> >            .shuffleGrouping("spout")
> >            .allGrouping("flush-stream", "DbBolt");
> >
> >     where "flush-stream" is a second output stream of MyDBBolt() sending
> a
> >     notification tuple after it received the end-of-stream from spout;
> >     furthermore, if a bolt received the signal via "flush-stream" from
> >     **all** parallel bolt instances, it can flush to DB.
> >
> >     Or something like this... Be creative! :)
> >
> >
> >     -Matthias
> >
> >
> >     On 05/08/2016 02:26 PM, Navin Ipe wrote:
> >     > @Matthias: I agree about the batch processor, but my superior took
> the
> >     > decision to use Storm, and he visualizes more complexity later for
> >     which
> >     > he needs Storm.
> >     > I had considered the "end of stream" tuple earlier (my idea was to
> >     emit
> >     > 10 consecutive nulls), but then the question was how do I know how
> >     many
> >     > bolt instances have been created, and how do I notify all the
> bolts?
> >     > Because it's only after the last bolt finishes writing to DB, that
> I
> >     > have to shut down the topology.
> >     >
> >     > @Jason: Thanks. I had seen storm signals earlier (I think from one
> of
> >     > your replies to someone else) and I had a look at the code too,
> >     but am a
> >     > bit wary because it's no longer being maintained and because of the
> >     > issues: https://github.com/ptgoetz/storm-signals/issues
> >     >
> >     > On Sun, May 8, 2016 at 5:40 AM, Jason Kusar <ja...@kusar.net
> >     <javascript:;>
> >     > <mailto:ja...@kusar.net <javascript:;>>> wrote:
> >     >
> >     >     You might want to check out Storm Signals.
> >     >     https://github.com/ptgoetz/storm-signals
> >     >
> >     >     It might give you what you're looking for.
> >     >
> >     >
> >     >     On Sat, May 7, 2016, 11:59 AM Matthias J. Sax
> >     <mj...@apache.org <javascript:;>
> >     >     <mailto:mj...@apache.org <javascript:;>>> wrote:
> >     >
> >     >         As you mentioned already: Storm is designed to run
> topologies
> >     >         forever ;)
> >     >         If you have finite data, why do you not use a batch
> >     processor???
> >     >
> >     >         As a workaround, you can embed "control messages" in your
> >     stream
> >     >         (or use
> >     >         an additional stream for them).
> >     >
> >     >         If you want a topology to shut down itself, you could use
> >     >
> >
> `NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);`
> >     >         in your spout/bolt code.
> >     >
> >     >         Something like:
> >     >          - Spout emit all tuples
> >     >          - Spout emit special "end of stream" control tuple
> >     >          - Bolt1 processes everything
> >     >          - Bolt1 forward "end of stream" control tuple (when it
> >     received it)
> >     >          - Bolt2 processed everything
> >     >          - Bolt2 receives "end of stream" control tuple => flush
> to DB
> >     >         => kill
> >     >         topology
> >     >
> >     >         But I guess, this is kinda weird pattern.
> >     >
> >     >         -Matthias
> >     >
> >     >         On 05/05/2016 06:13 AM, Navin Ipe wrote:
> >     >         > Hi,
> >     >         >
> >     >         > I know Storm is designed to run forever. I also know
> about
> >     >         Trident's
> >     >         > technique of aggregation. But shouldn't Storm have a way
> to
> >     >         let bolts
> >     >         > know that a certain bunch of processing has been
> completed?
> >     >         >
> >     >         > Consider this topology:
> >     >         > Spout------>Bolt-A------>Bolt-B
> >     >         >             |                  |--->Bolt-B
> >     >         >             |                  \--->Bolt-B
> >     >         >             |--->Bolt-A------>Bolt-B
> >     >         >             |                  |--->Bolt-B
> >     >         >             |                  \--->Bolt-B
> >     >         >             \--->Bolt-A------>Bolt-B
> >     >         >                                |--->Bolt-B
> >     >         >                                \--->Bolt-B
> >     >         >
> >     >         >   * From Bolt-A to Bolt-B, it is a FieldsGrouping.
> >     >         >   * Spout emits only a few tuples and then stops
> emitting.
> >     >         >   * Bolt A takes those tuples and generates millions of
> >     tuples.
> >     >         >
> >     >         >
> >     >         > *Bolt-B accumulates tuples that Bolt A sends and needs
> >     to know
> >     >         when
> >     >         > Spout finished emitting. Only then can Bolt-B start
> >     writing to
> >     >         SQL.*
> >     >         >
> >     >         > *Questions:*
> >     >         > 1. How can all Bolts B be notified that it is time to
> >     write to
> >     >         SQL?
> >     >         > 2. After all Bolts B have written to SQL, how to know
> >     that all
> >     >         Bolts B
> >     >         > have completed writing?
> >     >         > 3. How to stop the topology? I know of
> >     >         > localCluster.killTopology("HelloStorm"), but shouldn't
> there
> >     >         be a way to
> >     >         > do it from the Bolt?
> >     >         >
> >     >         > --
> >     >         > Regards,
> >     >         > Navin
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > Regards,
> >     > Navin
> >
>
>


-- 
Regards,
Navin

Reply via email to