Thanks for all the help. :)
On Wed, Sep 21, 2016 at 11:56 AM, Harsh Choudhary <[email protected]> wrote: > It is real-time. I get streaming JSONs from Kafka. > > > > > On Wed, Sep 21, 2016 at 4:15 AM, Ambud Sharma <[email protected]> > wrote: > >> Is this real-time or batch? >> >> If batch this is perfect for MapReduce or Spark. >> >> If real-time then you should use Spark or Storm Trident. >> >> On Sep 20, 2016 9:39 AM, "Harsh Choudhary" <[email protected]> wrote: >> >>> My use case is that I have a json which contains an array. I need to >>> split that array into multiple jsons and do some computations on them. >>> After that, results from each json has to be used in further calculation >>> altogether and come up with the final result. >>> >>> *Cheers!* >>> >>> Harsh Choudhary / Software Engineer >>> >>> Blog / express.harshti.me >>> >>> [image: Facebook] <https://facebook.com/shry.harsh> [image: Twitter] >>> <https://twitter.com/har_ssh> [image: Google Plus] >>> <https://plus.google.com/107567038912927268680> >>> <https://in.linkedin.com/in/choudharyharsh> [image: Linkedin] >>> <https://in.linkedin.com/in/choudharyharsh> [image: Instagram] >>> <https://instagram.com/harsh.choudhary> >>> <https://www.pinterest.com/shryharsh/>[image: 500px] >>> <https://500px.com/harshchoudhary> [image: github] >>> <https://github.com/shry15harsh> >>> >>> On Tue, Sep 20, 2016 at 9:09 PM, Ambud Sharma <[email protected]> >>> wrote: >>> >>>> What's your use case? >>>> >>>> The complexities can be manage d as long as your tuple branching is >>>> reasonable I.e. 1 tuple creates several other tuples and you need to sync >>>> results between them. >>>> >>>> On Sep 20, 2016 8:19 AM, "Harsh Choudhary" <[email protected]> >>>> wrote: >>>> >>>>> You're right. For that I have to manage the queue and all those >>>>> complexities of timeout. If Storm is not the right place to do this then >>>>> what else? >>>>> >>>>> >>>>> >>>>> On Tue, Sep 20, 2016 at 8:25 PM, Ambud Sharma <[email protected]> >>>>> wrote: >>>>> >>>>>> The correct way is to perform time window aggregation using bucketing. >>>>>> >>>>>> Use the timestamp on your event computed from.various stages and send >>>>>> it to a single bolt where the aggregation happens. You only emit from >>>>>> this >>>>>> bolt once you receive results from both parts. >>>>>> >>>>>> It's like creating a barrier or the join phase of a fork join pool. >>>>>> >>>>>> That said the more important question is is Storm the right place do >>>>>> to this? When you perform time window aggregation you are susceptible to >>>>>> tuple timeouts and have to also deal with making sure your aggregation is >>>>>> idempotent. >>>>>> >>>>>> On Sep 20, 2016 7:49 AM, "Harsh Choudhary" <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> But how would that solve the syncing problem? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 20, 2016 at 8:12 PM, Alberto São Marcos < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I would dump the *Bolt-A* results in a shared-data-store/queue and >>>>>>>> have a separate workflow with another spout and Bolt-B draining from >>>>>>>> there >>>>>>>> >>>>>>>> On Tue, Sep 20, 2016 at 9:20 AM, Harsh Choudhary < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> I am thinking of doing the following. >>>>>>>>> >>>>>>>>> Spout subscribed to Kafka and get JSONs. Spout emits the JSONs as >>>>>>>>> individual tuples. >>>>>>>>> >>>>>>>>> Bolt-A has subscribed to the spout. Bolt-A creates multiple JSONs >>>>>>>>> from a json and emits them as multiple streams. >>>>>>>>> >>>>>>>>> Bolt-B receives these streams and do the computation on them. >>>>>>>>> >>>>>>>>> I need to make a cumulative result from all the multiple JSONs >>>>>>>>> (which are emerged from a single JSON) in a Bolt. But a bolt static >>>>>>>>> instance variable is only shared between tasks per worker. How do >>>>>>>>> achieve >>>>>>>>> this syncing process. >>>>>>>>> >>>>>>>>> ---> >>>>>>>>> Spout ---> Bolt-A ---> Bolt-B ---> Final result >>>>>>>>> ---> >>>>>>>>> >>>>>>>>> The final result is per JSON which was read from Kafka. >>>>>>>>> >>>>>>>>> Or is there any other way to achieve this better? >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> >
