Re: Apache Storm Twitter account

2019-06-17 Thread Arun Mahadevan
Flink recently published a blog on their networking model [1]. I believe Storm 2.0 does it better in some aspects with the new threading/back-pressure model and we should come out with something that explains the new model and the tuning parameters. - Arun [1] https://flink.apache.org/2019/06/05/

Re: when are the storm window start?

2018-12-05 Thread Arun Mahadevan
The window start time would be based on which window the events falls into (based on the window length and sliding interval that you set) In the master branch you can access this start and end timestamps. See https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/window

Re: Batch processing.

2018-07-05 Thread Arun Mahadevan
> I'm aware of some ideas so far: one simple idea is killing topology when > spout gets all acknowledge messages and data source has no more data. There would be more work needed to optimize the shuffle and scheduling (batch), but I agree with minor modifications to the Spout API (like having a

Re: Batch processing.

2018-07-05 Thread Arun Mahadevan
Your use case seems to be a simple ETL (read from a data source and write to a Sink), which is very well addressed by Storm. With Storm you don’t necessarily need to split the data into batches, but can continuously load the data into ES. If your data set is bounded, you can just kill the topolo

Re: Can I emit a Map?

2017-11-21 Thread Arun Mahadevan
I think it would be fine as long as you ensure your map is Immutable or at-least its not changed after its constructed. I don’t think there would be any issues with sizes. From: Toy Reply-To: "user@storm.apache.org" Date: Wednesday, November 22, 2017 at 2:50 AM To: "user@storm.apache.org"

Re: Behavior of Storm when buffers fill

2017-09-26 Thread Arun Mahadevan
If Bolt B’s receive queue is full the tuple will be put into an overflow buffer. If back pressure is not enabled it will keep on filling the overflow buffer and eventually cause an OOM. You might also want to see the proposed changes in STORM-2306 where the receive queues are going to be bound

Re: Order of tuples

2017-08-21 Thread Arun Mahadevan
Between two tasks the ordering is maintained. So if the spout and the bolts have a parallelism of one the ordering will be usually maintained. However currently there are some cases where messages are dropped at the source worker if the connection to the destination worker goes down and the mess

Re: is stateful bolts production ready?

2017-08-14 Thread Arun Mahadevan
t; > > > Could you please help me with my questions 2 and 3 if possible? > > > > > > Thanks > > Manusha > > > > *From:* Arun Iyer [mailto:ai...@hortonworks.com] *On Behalf Of *Arun > Mahadevan > *Sent:* 11 August 2017 10:20 > *To:* Wijekoon, Manus

Re: is stateful bolts production ready?

2017-08-11 Thread Arun Mahadevan
ate for the namespace in concern from the persistent store. Again if state persistence is handled by framework, how do we know where to get state from? 3. Are checkpoint related methods called by the same bolt or spout thread? Thanks Manusha From: Arun Iyer

Re: is stateful bolts production ready?

2017-07-24 Thread Arun Mahadevan
The bolt just needs to “put” the values into the Key-Value state that the bolt gets initialized with during “initState”. The framework automatically takes care of saving the state behind the scenes. Theres an example in storm-starter that you might find useful - https://github.com/apache/storm/

Re: Why The BaseWindowedBolt Excuted 2,428,080 But Only acked 282,380

2017-07-03 Thread Arun Mahadevan
The tuples are ack-ed only once they fall out of the window. The ‘executed’ are typically the tuples that are received by the windowed bolt and enqueued for processing. Once the window moves forward, tuples fall out of the window and are acked. Since you have set a timestamp field, the window wi

Re: Spout resume after rebalance

2017-04-17 Thread Arun Mahadevan
During a rebalance [1] nimbus resets the component_executor map and arranges for a “DO_REBALANCE” to be triggered after a delay (see [2]). When nimbus receives the “DO_REBALANCE”, it does the actual rebalance where it computes and makes the new assignments [3]. I would assume the spouts

Re: Why Tasks?

2017-04-04 Thread Arun Mahadevan
Fixing the tasks ensures that state is preserved during a re-balance and the tuples gets routed to the same task id with fields grouping. Users could be storing some state in a bolt (like maintaining some in-memory counter or something) without necessarily using a stateful bolt. If the number of

Re: Delay in CHKPT message for stateful task

2017-03-28 Thread Arun Mahadevan
hat is observed in topologyBuilder code). On Sat, Mar 25, 2017 at 2:42 PM, Arun Mahadevan wrote: The checkpoint tuples have to go through the same queue and follow the tuples emitted before it to make the state consistent across the bolts. When bolt ‘A’ receives a checkpoi

Re: Tuple processing

2017-03-26 Thread Arun Mahadevan
With a single worker, bolts A & B would be receiving reference to the same tuple since they are running in the same JVM (splitter bolt emits it once and the thread/task running A & B get the same instance of the tuple). Now if you mutate any value in the tuple (collection of scores in your case)

Re: Delay in CHKPT message for stateful task

2017-03-25 Thread Arun Mahadevan
The checkpoint tuples have to go through the same queue and follow the tuples emitted before it to make the state consistent across the bolts. When bolt ‘A’ receives a checkpoint (say C1 from the spout), it saves its state (of processing the tuples up to C1) and emits ’C1’ to the next bolt sa

Re: Benchmarking streaming technologies

2017-03-23 Thread Arun Mahadevan
You can take a look at the Yahoo streaming benchmark. https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at Regards, Arun From: Giselle van Dongen Reply-To: "user@storm.apache.org" Date: Thursday, March 23, 2017 at 3:35 PM To: "user@storm.apache.

Re: Converting storm tuple to bytearray

2017-03-21 Thread Arun Mahadevan
replay value in the tuple to the older taskID or StreamID etc. all those details will be lost . (I am doing this for replaying tuples after state migration.) On Tue, Mar 21, 2017 at 9:31 AM, Arun Mahadevan wrote: Storm uses Kryo to serialize Tuples. Check this https://github.com/apache/sto

Re: Converting storm tuple to bytearray

2017-03-20 Thread Arun Mahadevan
Storm uses Kryo to serialize Tuples. Check this https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java Instead of serializing the entire tuple yourself may be you just want to serialize the relevant values within the tuple.

Re: Writing custom Serializer Stateful bolt

2017-03-11 Thread Arun Mahadevan
Is your kryo instance wrapped in a thread local like in the DefaultStateSerializer? Also see - https://github.com/EsotericSoftware/kryo#threading Arun From: anshu shukla Reply-To: "user@storm.apache.org" Date: Saturday, March 11, 2017 at 8:16 PM To: "user@storm.apache.org" Subject:

Re: Serializing states of multiple and different class types (Stateful bolts)

2017-03-04 Thread Arun Mahadevan
Internally the DefaultStateSerializer uses Kryo, so it should be able to handle the different types automatically and its optional to provide a state provider config. The classes you provide the keyClass and valueClass are registered with Kryo (right now it accepts only one entry, but it sh

Re: Rebalancing Stateful bolts in storm 1.0.2

2017-02-19 Thread Arun Mahadevan
This is expected with in-memory state, which stores the state in a local hash map and is not intended for any real use cases. And I don’t think there is any value in serializing the in-memory state during rebalance. How would you resurrect the state if the task gets reassigned to a different hos

Re: implmement state management with Apache Ignite

2017-02-15 Thread Arun Mahadevan
m 1.0.2 only provide a simple KeyValueState interface, We are still considering whether it can fulfill our requirements. Thanks Shawn On 02/7/2017 14:08,Arun Mahadevan wrote: > #1 when a worker is died or killed by manually, storm framework will restart > this worker, is there a

Re: Fault tolerance for stateful operator

2017-02-13 Thread Arun Mahadevan
> 1- Does scaling up/down using rebalance will maintain the state in case of > stateful task proposed in storm 1.0.2. The current rebalance and state logic takes care of this. However, if we do dynamic task scaling in future, the state migration also needs to be taken care of. Thanks, Aru

Re: implmement state management with Apache Ignite

2017-02-07 Thread Arun Mahadevan
ask ? 2- Whats is the impact fo rebalance operation on these bolts. On Tue, Feb 7, 2017 at 11:40 AM, Arun Mahadevan wrote: > #1 when a worker is died or killed by manually, storm framework will restart > this worker, is there a ID which doesn't change for the new worker and the

Re: implmement state management with Apache Ignite

2017-02-06 Thread Arun Mahadevan
> #1 when a worker is died or killed by manually, storm framework will restart > this worker, is there a ID which doesn't change for the new worker and the > died worker? if there is, how to get it? Task Id does not change. The worker can be restarted in the same or different host and the task

Re: Good examples of StatefulWindowedBoltExecutor

2017-01-27 Thread Arun Mahadevan
There is an example here - https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/storm/starter/StatefulWindowingTopology.java. Also checkout the “Stateful windowing” section under https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-st

Re: simple question about grouping

2017-01-23 Thread Arun Mahadevan
uot; Date: Monday, January 23, 2017 at 4:58 PM To: "user@storm.apache.org" Subject: Re: simple question about grouping Many thanks , but how and when can i decide that this number is perfect form me or not ? On Mon, Jan 23, 2017 at 1:27 PM, Arun Mahadevan wrote: > builder.setBo

Re: simple question about grouping

2017-01-23 Thread Arun Mahadevan
new mySpout(), 1); builder.setBolt("MyBolt", new MyBolt(), 4).shuffleGrouping("MySpout"); i found this example but couldn't know why he use number 4 ? On Mon, Jan 23, 2017 at 1:13 PM, sam mohel wrote: thanks for replying On Mon, Jan 23, 2017 at 1:14 PM, Arun

Re: simple question about grouping

2017-01-23 Thread Arun Mahadevan
Grouping makes sense only when you have more than one task for a bolt. If your bolt has more than one task, then the grouping will decide how the tuples from the spout are distributed to the individual tasks of the bolt. (shuffe = random, fields = keyed on some field and so on). See http

Re: Storm BaseStatefulBolt not working

2016-12-15 Thread Arun Mahadevan
> A CheckpointSpout is active and checkpoints the state like it should, but the > state isn't synchronized between the instances of the StatefulBolt. The state is private to the bolt instance (task). Each bolt task’s state is independent and is not synchronized between the instances of the bo

Re: Stateful bolts

2016-11-03 Thread Arun Mahadevan
Hi Abhishek, Right now delete/clear is not supported, so you need to workaround this by putting ‘null’ values. Clear/delete will be added in future. There’s a pending patch to support delete (https://github.com/apache/storm/pull/1470). Thanks, Arun From: Abhishek Raj Reply-To:

Re: When to use MemoryMapState while performing a persistentAggregate in Trident?

2016-10-26 Thread Arun Mahadevan
MemoryMapState is more for testing and does not provide any persistence. It uses a HashMap internally. If you want persistence you need use the one based on redis or other. Thanks, Arun From: Dinesh Babu K G Reply-To: "user@storm.apache.org" Date: Wednesday, October 26, 2016 at 2:31

Re: windowing & max spout pending

2016-09-01 Thread Arun Mahadevan
e) between the >component instances, then the topology are going to stop processing >messages, as the watermarks are emitted on a per component basis. > >Unless I'm mistaken somewhere, the max.spout.pending should be higher >than (parallelism * (number of tuples in window length + sl

Re: windowing & max spout pending

2016-08-24 Thread Arun Mahadevan
Hi Balazs, Tuples are acked only when the window slides and the events fall out of the window. So max.spout.pending should be more than max number of tuples in window length + sliding interval. Thanks, Arun On 8/24/16, 8:33 PM, "Balázs Kossovics" wrote: >Hello, > >I've recently hit a pro

Re: Trident HBaseState query and update ordering

2016-07-19 Thread Arun Mahadevan
If I understand correctly, the issue is that your trident topology queries the same state that’s being updated to compute the result. You can control the number of batches that trident processes simultaneously by adjusting the value of “topology.max.spout.pending”, if you set it to 1 the proc

Re: Question on Storm 1.0 State Management

2016-07-19 Thread Arun Mahadevan
nstance right ? Any suggestion on handling such cases ? Thanks, Jins George On Mon, Jul 18, 2016 at 9:17 PM, Arun Mahadevan wrote: Each bolt instance (task) has its own state, so in your case each of the 5 instances would have its own state. All these state instances could be in a same underly

Re: Question on Storm 1.0 State Management

2016-07-18 Thread Arun Mahadevan
Each bolt instance (task) has its own state, so in your case each of the 5 instances would have its own state. All these state instances could be in a same underlying storage instance (e.g. same Redis cluster). Thanks, Arun From: Jins George Reply-To: "user@storm.apache.org" Date

Re: About count windows behavior

2016-07-12 Thread Arun Mahadevan
It may be a bug. You can raise an issue here - https://issues.apache.org/jira/browse/STORM If you just want count based windows (without accounting for event time/watermarks) you don’t need to set withTimestampField() and it should work. Thanks, Arun From: Lorenzo Affetti Reply-To:

Re: Tumbling Count and Time-Based WindowedBolt

2016-06-06 Thread Arun Mahadevan
Hi Cody, Tumbling window can be either count or time based, not both. I think what you want is a count based window with a time based sliding interval. The window will activate every sliding interval and will give you the last “count” events. You can use the “getNew” if you want only the new e

Re: State Checkpointing & spout state

2016-05-18 Thread Arun Mahadevan
POLGY_MESSAGE_TIMEOUT)? Many thanks for your help. Olivier. On Tue, May 17, 2016 at 2:12 PM, Arun Mahadevan wrote: Hi Oliver, The state checkpointing currently does not checkpoint the state of the spout. It checkpoints the states of all the bolts and once that’s successful, the tuples emitted

Re: State Checkpointing & spout state

2016-05-17 Thread Arun Mahadevan
Hi Oliver, The state checkpointing currently does not checkpoint the state of the spout. It checkpoints the states of all the bolts and once that’s successful, the tuples emitted by the spout are acked. So currently it provides at-least once guarantee. In the ack method of the spout, you can

Re: initState method not invoked in Storm 1.0

2016-04-15 Thread Arun Mahadevan
rds, Alex On Apr 15, 2016 11:16 AM, "Arun Mahadevan" wrote: Ah, I see what you mean. The “setBolt” method without parallelism hint is not overloaded for stateful bolts so if parallelism hint is not specified it ends up as being normal bolt. Will raise a JIRA for fixing this. Spi

Re: initState method not invoked in Storm 1.0

2016-04-15 Thread Arun Mahadevan
ored tuples (that way the the exception was complaining)? I look forward for your answers, Florin On Fri, Apr 15, 2016 at 12:16 PM, Arun Mahadevan wrote: Ah, I see what you mean. The “setBolt” method without parallelism hint is not overloaded for stateful bolts so if parallelism hint is not speci

Re: initState method not invoked in Storm 1.0

2016-04-15 Thread Arun Mahadevan
etBolt method overload by mistake, since stateful bolts are supertypes of stateless ones. Regards Alex On Apr 15, 2016 10:54 AM, "Arun Mahadevan" wrote: Its the same method (builder.setBolt) that adds stateful bolts to a topology. Heres an example - https://github.com/apache/storm/blob/ma

Re: initState method not invoked in Storm 1.0

2016-04-15 Thread Arun Mahadevan
Its the same method (builder.setBolt) that adds stateful bolts to a topology. Heres an example - https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/storm/starter/StatefulTopology.java Spico, Do you see any errors in the logs ? You might want to turn on debug logs and se

Re: Storm 1.0.0 Windowing by id

2016-04-04 Thread Arun Mahadevan
f all ids? Thank you! Filipa From: Arun Iyer [mailto:ai...@hortonworks.com] On Behalf Of Arun Mahadevan Sent: 2 de abril de 2016 20:33 To: user@storm.apache.org Subject: Re: Storm 1.0.0 Windowing by id Hi Filipa, Yes, you could have a separate stream per id and have a separate wind

Re: Aw: Re: Combining group by and time window

2016-04-02 Thread Arun Mahadevan
Hi Daniela, > Okay, could I do the grouping already in Kafka? For example would it be > possible to use one topic per region or to use one topic with a partition for > every region? Then the messages would already be grouped when the arrive at > Storm. Is this correct? You would need a kafka s

Re: Storm 1.0.0 Windowing by id

2016-04-02 Thread Arun Mahadevan
Hi Filipa, Yes, you could have a separate stream per id and have a separate windowed bolt subscribe to the corresponding stream. The other option is to do the id based grouping within the windowed bolt each time its "execute" method is triggered with the tuples in the current window. Thanks,

Re: Intermittent error with stateful topology

2016-03-10 Thread Arun Mahadevan
Hi Alexander, Can you turn on debug logs and see if the logs have any more information ? What is your topology like ? You might want to file a JIRA and upload the debug logs. Thanks, Arun From: Alexander T Reply-To: "user@storm.apache.org" Date: Thursday, March 10, 2016 at 7:28 PM To: "us

Re: Stateful bolts and acking

2016-03-08 Thread Arun Mahadevan
d if I replace it with a simple stateful bolt, the acking starts working as expected again. Is there support for non-stateful windowed bolts? Best regards Alexander On Mon, Mar 7, 2016 at 12:16 PM, Arun Mahadevan wrote: Hi Alexander, You are right, the acking for the non-stateful bolts in a sta

Re: Stateful bolts and acking

2016-03-07 Thread Arun Mahadevan
use bolts which were written for stateless topologies in stateful ones. Are there any plans of adapters or to change the interface so that they can interoperate? Best regards Alexander On Fri, Mar 4, 2016 at 7:26 PM, Arun Mahadevan wrote: Hi Alexander, The simple topology like the one you h

Re: Stateful bolts and acking

2016-03-04 Thread Arun Mahadevan
ght have missed? Regards, Alexander On Mar 4, 2016 5:30 PM, "Arun Mahadevan" wrote: Hi Alexander, For a stateful topology the anchoring and acking is automatically taken care of. Can you check if any of your bolts inherit BaseBasicBolt or if you are manually acking. Your no

Re: Stateful bolts and acking

2016-03-04 Thread Arun Mahadevan
Hi Alexander, For a stateful topology the anchoring and acking is automatically taken care of. Can you check if any of your bolts inherit BaseBasicBolt or if you are manually acking. Your non-stateful bolts could inherit from BaseRichBolt instead. Thanks, Arun From: Alexander T Reply-To: