Re: Centralized logging for storm

2017-03-31 Thread Harsh Choudhary
also lookup all the other logs near a timestamp. *Cheers!* Harsh Choudhary On Fri, Mar 31, 2017 at 1:16 PM, Shashank Prasad wrote: > Hi folks, > > Storm is a great tool but the logs are all over the place. As you increase > your workers, your log files will increase as well and there

Re: Storm files and folders permissions on Linux

2017-03-30 Thread Harsh Choudhary
Can you show the exact error you getting? *Cheers!* Harsh Choudhary On Thu, Mar 30, 2017 at 11:39 PM, I PVP wrote: > that is how it is being done as of now: > > sudo chown -R storm:storm /opt/storm > sudo chmod -R 700 /opt/storm > > but still facing some issues while sub

Re: Storm files and folders permissions on Linux

2017-03-30 Thread Harsh Choudhary
It depends on from which user, you are running storm. That user must own the folders of storm. So, you need not use chmod but chown. *Cheers!* Harsh Choudhary On Thu, Mar 30, 2017 at 11:24 PM, I PVP wrote: > What are the recommended files/folders permissions for running Storm on >

Re: [storm-kafka] where is stored Kafka Spout consummer's offset?

2017-03-28 Thread Harsh Choudhary
pic-id > > Best regards, > Alexandre Vermeerbergen > > > 2017-03-28 8:12 GMT+02:00 Harsh Choudhary : > >> The storm stores its offset in the Zookeeper, it is connected to. So, you >> won't find the offset information for the storm clients in the same place >

Re: [storm-kafka] where is stored Kafka Spout consummer's offset?

2017-03-27 Thread Harsh Choudhary
ion in its Zookeeper. *Cheers!* Harsh Choudhary / Software Engineer Blog / express.harshti.me [image: Facebook] <https://facebook.com/shry.harsh> [image: Twitter] <https://twitter.com/har_ssh> [image: Google Plus] <https://plus.google.com/107567038912927268680> <https://in

Re: NullPointerException on startup

2016-11-18 Thread Harsh Choudhary
dified the Kafka spout? *Cheers!* Harsh Choudhary On Sat, Nov 19, 2016 at 1:24 AM, Cuneo, Nicholas wrote: > The spout is initialized during topology submission, so how would you > delay that? Kafka is already running for a long period of time. > > Thanks, > Nick > >

Re: NullPointerException on startup

2016-11-18 Thread Harsh Choudhary
or other tasks it does when it subscribes. > > > > Like I said, this happens occasionally during startup but not reliably and > has nothing to do with my code other than I’m probably acking the received > message faster than the kafka spout can finish initialization. > > &

Re: NullPointerException on startup

2016-11-18 Thread Harsh Choudhary
Hi This happens when there is some code in bolt or spout which throws Null Pointer Exception. I suggest you to use Debugger in your IDE, to find out where is this happening. You can try making a Local Cluster and runs it in IDE to figure it out easily. It never happens because of Storm, so do not

Storm spout sends next tuple before completion of current

2016-11-06 Thread Harsh Choudhary
Hi I have a bolt (SPLITTER) which receives data from kafkaspout. The SPLITTER bolt splits the data and emits them into multiple streams to another bolt (WRANGLER). The Wrangler takes some time for processing some of the data. So before it can emit the data to another stream, the spout sends the ne

Re: Syncing multiple streams to compute final result from a bolt

2016-09-22 Thread Harsh Choudhary
Thanks for all the help. :) On Wed, Sep 21, 2016 at 11:56 AM, Harsh Choudhary wrote: > It is real-time. I get streaming JSONs from Kafka. > > > > > On Wed, Sep 21, 2016 at 4:15 AM, Ambud Sharma > wrote: > >> Is this real-time or batch? >> >> If ba

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
It is real-time. I get streaming JSONs from Kafka. On Wed, Sep 21, 2016 at 4:15 AM, Ambud Sharma wrote: > Is this real-time or batch? > > If batch this is perfect for MapReduce or Spark. > > If real-time then you should use Spark or Storm Trident. > > On Sep 20, 2016 9:39

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
My use case is that I have a json which contains an array. I need to split that array into multiple jsons and do some computations on them. After that, results from each json has to be used in further calculation altogether and come up with the final result. *Cheers!* Harsh Choudhary / Software

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
aid the more important question is is Storm the right place do to > this? When you perform time window aggregation you are susceptible to tuple > timeouts and have to also deal with making sure your aggregation is > idempotent. > > On Sep 20, 2016 7:49 AM, "Harsh Choudhary&q

Re: Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
But how would that solve the syncing problem? On Tue, Sep 20, 2016 at 8:12 PM, Alberto São Marcos wrote: > I would dump the *Bolt-A* results in a shared-data-store/queue and have a > separate workflow with another spout and Bolt-B draining from there > > On Tue, Sep 20, 2016 at 9:

Syncing multiple streams to compute final result from a bolt

2016-09-20 Thread Harsh Choudhary
Hi I am thinking of doing the following. Spout subscribed to Kafka and get JSONs. Spout emits the JSONs as individual tuples. Bolt-A has subscribed to the spout. Bolt-A creates multiple JSONs from a json and emits them as multiple streams. Bolt-B receives these streams and do the computation on