Flink recently published a blog on their networking model [1].
I believe Storm 2.0 does it better in some aspects with the new
threading/back-pressure model and we should come out with something that
explains the new model and the tuning parameters.
- Arun
[1] https://flink.apache.org/2019/06/05/
The window start time would be based on which window the events falls into
(based on the window length and sliding interval that you set)
In the master branch you can access this start and end timestamps.
See
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/window
> I'm aware of some ideas so far: one simple idea is killing topology when
> spout gets all acknowledge messages and data source has no more data.
There would be more work needed to optimize the shuffle and scheduling (batch),
but I agree with minor modifications to the Spout API (like having a
Your use case seems to be a simple ETL (read from a data source and write to a
Sink), which is very well addressed by Storm. With Storm you don’t necessarily
need to split the data into batches, but can continuously load the data into
ES. If your data set is bounded, you can just kill the topolo
I think it would be fine as long as you ensure your map is Immutable or
at-least its not changed after its constructed. I don’t think there would be
any issues with sizes.
From: Toy
Reply-To: "user@storm.apache.org"
Date: Wednesday, November 22, 2017 at 2:50 AM
To: "user@storm.apache.org"
If Bolt B’s receive queue is full the tuple will be put into an overflow
buffer. If back pressure is not enabled it will keep on filling the overflow
buffer and eventually cause an OOM.
You might also want to see the proposed changes in STORM-2306 where the receive
queues are going to be bound
Between two tasks the ordering is maintained. So if the spout and the bolts
have a parallelism of one the ordering will be usually maintained. However
currently there are some cases where messages are dropped at the source worker
if the connection to the destination worker goes down and the mess
t;
>
>
> Could you please help me with my questions 2 and 3 if possible?
>
>
>
>
>
> Thanks
>
> Manusha
>
>
>
> *From:* Arun Iyer [mailto:ai...@hortonworks.com] *On Behalf Of *Arun
> Mahadevan
> *Sent:* 11 August 2017 10:20
> *To:* Wijekoon, Manus
ate for the namespace in concern from the
persistent store. Again if state persistence is handled by framework, how do we
know where to get state from?
3. Are checkpoint related methods called by the same bolt or spout thread?
Thanks
Manusha
From: Arun Iyer
The bolt just needs to “put” the values into the Key-Value state that the bolt
gets initialized with during “initState”. The framework automatically takes
care of saving the state behind the scenes.
Theres an example in storm-starter that you might find useful -
https://github.com/apache/storm/
The tuples are ack-ed only once they fall out of the window. The ‘executed’ are
typically the tuples that are received by the windowed bolt and enqueued for
processing. Once the window moves forward, tuples fall out of the window and
are acked. Since you have set a timestamp field, the window wi
During a rebalance [1] nimbus resets the component_executor map and arranges
for a “DO_REBALANCE” to be triggered after a delay (see [2]).
When nimbus receives the “DO_REBALANCE”, it does the actual rebalance where it
computes and makes the new assignments [3].
I would assume the spouts
Fixing the tasks ensures that state is preserved during a re-balance and the
tuples gets routed to the same task id with fields grouping. Users could be
storing some state in a bolt (like maintaining some in-memory counter or
something) without necessarily using a stateful bolt. If the number of
hat is observed in
topologyBuilder code).
On Sat, Mar 25, 2017 at 2:42 PM, Arun Mahadevan wrote:
The checkpoint tuples have to go through the same queue and follow the tuples
emitted before it to make the state consistent across the bolts.
When bolt ‘A’ receives a checkpoi
With a single worker, bolts A & B would be receiving reference to the same
tuple since they are running in the same JVM (splitter bolt emits it once and
the thread/task running A & B get the same instance of the tuple). Now if you
mutate any value in the tuple (collection of scores in your case)
The checkpoint tuples have to go through the same queue and follow the tuples
emitted before it to make the state consistent across the bolts.
When bolt ‘A’ receives a checkpoint (say C1 from the spout), it saves its state
(of processing the tuples up to C1) and emits ’C1’ to the next bolt sa
You can take a look at the Yahoo streaming benchmark.
https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
Regards,
Arun
From: Giselle van Dongen
Reply-To: "user@storm.apache.org"
Date: Thursday, March 23, 2017 at 3:35 PM
To: "user@storm.apache.
replay value in the tuple to the older taskID or StreamID
etc. all those details will be lost . (I am doing this for replaying tuples
after state migration.)
On Tue, Mar 21, 2017 at 9:31 AM, Arun Mahadevan wrote:
Storm uses Kryo to serialize Tuples. Check this
https://github.com/apache/sto
Storm uses Kryo to serialize Tuples. Check this
https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java
Instead of serializing the entire tuple yourself may be you just want to
serialize the relevant values within the tuple.
Is your kryo instance wrapped in a thread local like in the
DefaultStateSerializer?
Also see - https://github.com/EsotericSoftware/kryo#threading
Arun
From: anshu shukla
Reply-To: "user@storm.apache.org"
Date: Saturday, March 11, 2017 at 8:16 PM
To: "user@storm.apache.org"
Subject:
Internally the DefaultStateSerializer uses Kryo, so it should be able to handle
the different types automatically and its optional to provide a state provider
config.
The classes you provide the keyClass and valueClass are registered with Kryo
(right now it accepts only one entry, but it sh
This is expected with in-memory state, which stores the state in a local hash
map and is not intended for any real use cases. And I don’t think there is any
value in serializing the in-memory state during rebalance. How would you
resurrect the state if the task gets reassigned to a different hos
m 1.0.2 only provide a simple KeyValueState interface, We are
still considering whether it can fulfill our requirements.
Thanks
Shawn
On 02/7/2017 14:08,Arun Mahadevan wrote:
> #1 when a worker is died or killed by manually, storm framework will restart
> this worker, is there a
> 1- Does scaling up/down using rebalance will maintain the state in case of
> stateful task proposed in storm 1.0.2.
The current rebalance and state logic takes care of this. However, if we do
dynamic task scaling in future, the state migration also needs to be taken care
of.
Thanks,
Aru
ask ?
2- Whats is the impact fo rebalance operation on these bolts.
On Tue, Feb 7, 2017 at 11:40 AM, Arun Mahadevan wrote:
> #1 when a worker is died or killed by manually, storm framework will restart
> this worker, is there a ID which doesn't change for the new worker and the
> #1 when a worker is died or killed by manually, storm framework will restart
> this worker, is there a ID which doesn't change for the new worker and the
> died worker? if there is, how to get it?
Task Id does not change. The worker can be restarted in the same or different
host and the task
There is an example here -
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/storm/starter/StatefulWindowingTopology.java.
Also checkout the “Stateful windowing” section under
https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-st
uot;
Date: Monday, January 23, 2017 at 4:58 PM
To: "user@storm.apache.org"
Subject: Re: simple question about grouping
Many thanks , but how and when can i decide that this number is perfect form me
or not ?
On Mon, Jan 23, 2017 at 1:27 PM, Arun Mahadevan wrote:
> builder.setBo
new mySpout(), 1);
builder.setBolt("MyBolt", new MyBolt(), 4).shuffleGrouping("MySpout"); i found
this example but couldn't know why he use number 4 ?
On Mon, Jan 23, 2017 at 1:13 PM, sam mohel wrote:
thanks for replying
On Mon, Jan 23, 2017 at 1:14 PM, Arun
Grouping makes sense only when you have more than one task for a bolt. If your
bolt has more than one task, then the grouping will decide how the tuples from
the spout are distributed to the individual tasks of the bolt. (shuffe =
random, fields = keyed on some field and so on).
See http
> A CheckpointSpout is active and checkpoints the state like it should, but the
> state isn't synchronized between the instances of the StatefulBolt.
The state is private to the bolt instance (task). Each bolt task’s state is
independent and is not synchronized between the instances of the bo
Hi Abhishek,
Right now delete/clear is not supported, so you need to workaround this by
putting ‘null’ values.
Clear/delete will be added in future. There’s a pending patch to support delete
(https://github.com/apache/storm/pull/1470).
Thanks,
Arun
From: Abhishek Raj
Reply-To:
MemoryMapState is more for testing and does not provide any persistence. It
uses a HashMap internally. If you want persistence you need use the one based
on redis or other.
Thanks,
Arun
From: Dinesh Babu K G
Reply-To: "user@storm.apache.org"
Date: Wednesday, October 26, 2016 at 2:31
e) between the
>component instances, then the topology are going to stop processing
>messages, as the watermarks are emitted on a per component basis.
>
>Unless I'm mistaken somewhere, the max.spout.pending should be higher
>than (parallelism * (number of tuples in window length + sl
Hi Balazs,
Tuples are acked only when the window slides and the events fall out of the
window.
So max.spout.pending should be more than max number of tuples in window length
+ sliding interval.
Thanks,
Arun
On 8/24/16, 8:33 PM, "Balázs Kossovics" wrote:
>Hello,
>
>I've recently hit a pro
If I understand correctly, the issue is that your trident topology queries the
same state that’s being updated to compute the result. You can control the
number of batches that trident processes simultaneously by adjusting the value
of “topology.max.spout.pending”, if you set it to 1 the proc
nstance right ?
Any suggestion on handling such cases ?
Thanks,
Jins George
On Mon, Jul 18, 2016 at 9:17 PM, Arun Mahadevan wrote:
Each bolt instance (task) has its own state, so in your case each of the 5
instances would have its own state. All these state instances could be in a
same underly
Each bolt instance (task) has its own state, so in your case each of the 5
instances would have its own state. All these state instances could be in a
same underlying storage instance (e.g. same Redis cluster).
Thanks,
Arun
From: Jins George
Reply-To: "user@storm.apache.org"
Date
It may be a bug. You can raise an issue here -
https://issues.apache.org/jira/browse/STORM
If you just want count based windows (without accounting for event
time/watermarks) you don’t need to set withTimestampField() and it should work.
Thanks,
Arun
From: Lorenzo Affetti
Reply-To:
Hi Cody,
Tumbling window can be either count or time based, not both.
I think what you want is a count based window with a time based sliding
interval. The window will activate every sliding interval and will give you the
last “count” events. You can use the “getNew” if you want only the new e
POLGY_MESSAGE_TIMEOUT)?
Many thanks for your help.
Olivier.
On Tue, May 17, 2016 at 2:12 PM, Arun Mahadevan wrote:
Hi Oliver,
The state checkpointing currently does not checkpoint the state of the spout.
It checkpoints the states of all the bolts and once that’s successful, the
tuples emitted
Hi Oliver,
The state checkpointing currently does not checkpoint the state of the spout.
It checkpoints the states of all the bolts and once that’s successful, the
tuples emitted by the spout are acked. So currently it provides at-least once
guarantee.
In the ack method of the spout, you can
rds,
Alex
On Apr 15, 2016 11:16 AM, "Arun Mahadevan" wrote:
Ah, I see what you mean. The “setBolt” method without parallelism hint is not
overloaded for stateful bolts so if parallelism hint is not specified it ends
up as being normal bolt. Will raise a JIRA for fixing this.
Spi
ored tuples (that way the the exception was complaining)?
I look forward for your answers,
Florin
On Fri, Apr 15, 2016 at 12:16 PM, Arun Mahadevan wrote:
Ah, I see what you mean. The “setBolt” method without parallelism hint is not
overloaded for stateful bolts so if parallelism hint is not speci
etBolt method overload by
mistake, since stateful bolts are supertypes of stateless ones.
Regards
Alex
On Apr 15, 2016 10:54 AM, "Arun Mahadevan" wrote:
Its the same method (builder.setBolt) that adds stateful bolts to a topology.
Heres an example -
https://github.com/apache/storm/blob/ma
Its the same method (builder.setBolt) that adds stateful bolts to a topology.
Heres an example -
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/storm/starter/StatefulTopology.java
Spico,
Do you see any errors in the logs ? You might want to turn on debug logs and
se
f all ids?
Thank you!
Filipa
From: Arun Iyer [mailto:ai...@hortonworks.com] On Behalf Of Arun Mahadevan
Sent: 2 de abril de 2016 20:33
To: user@storm.apache.org
Subject: Re: Storm 1.0.0 Windowing by id
Hi Filipa,
Yes, you could have a separate stream per id and have a separate wind
Hi Daniela,
> Okay, could I do the grouping already in Kafka? For example would it be
> possible to use one topic per region or to use one topic with a partition for
> every region? Then the messages would already be grouped when the arrive at
> Storm. Is this correct?
You would need a kafka s
Hi Filipa,
Yes, you could have a separate stream per id and have a separate windowed bolt
subscribe to the corresponding stream.
The other option is to do the id based grouping within the windowed bolt each
time its "execute" method is triggered with the tuples in the current window.
Thanks,
Hi Alexander,
Can you turn on debug logs and see if the logs have any more information ? What
is your topology like ?
You might want to file a JIRA and upload the debug logs.
Thanks,
Arun
From: Alexander T
Reply-To: "user@storm.apache.org"
Date: Thursday, March 10, 2016 at 7:28 PM
To: "us
d if I replace
it with a simple stateful bolt, the acking starts working as expected again. Is
there support for non-stateful windowed bolts?
Best regards
Alexander
On Mon, Mar 7, 2016 at 12:16 PM, Arun Mahadevan wrote:
Hi Alexander,
You are right, the acking for the non-stateful bolts in a sta
use bolts which were written for
stateless topologies in stateful ones. Are there any plans of adapters or to
change the interface so that they can interoperate?
Best regards
Alexander
On Fri, Mar 4, 2016 at 7:26 PM, Arun Mahadevan wrote:
Hi Alexander,
The simple topology like the one you h
ght have missed?
Regards,
Alexander
On Mar 4, 2016 5:30 PM, "Arun Mahadevan" wrote:
Hi Alexander,
For a stateful topology the anchoring and acking is automatically taken care
of.
Can you check if any of your bolts inherit BaseBasicBolt or if you are manually
acking. Your no
Hi Alexander,
For a stateful topology the anchoring and acking is automatically taken care
of.
Can you check if any of your bolts inherit BaseBasicBolt or if you are manually
acking. Your non-stateful bolts could inherit from BaseRichBolt instead.
Thanks,
Arun
From: Alexander T
Reply-To:
54 matches
Mail list logo