We’ve found the same thing. A significant amount of cpu time is being taken up
by unlocking (?) within the disruptor queue.
From: Rahul Mittal [mailto:rahulmitta...@gmail.com]
Sent: 24 September 2014 16:16
To: user@storm.incubator.apache.org
Subject: High CPU usage by storm workers
Hi
We are
to use the batch
id to filter tuples doesn't seem to work.
Unfortunately, I can't understand the behaviour without some input from someone
who knows how trident works and can match this behaviour onto what trident is
*meant* to do on a batch replay.
SimonC
From: Simon Cooper [mailto:simon.coo
I'm trying to increase the throughput of one of our trident topologies. Running
a performance profile on the worker, it's spending a lot of time waiting on a
ReentrantLock via executor$mk_executor_transfer_fn and DisruptorQueue.publish.
My question is, why is it waiting? It's trying to send
May I request that the patch for STORM-341 is merged into trunk for the 0.9.3
release?
SimonC
-Original Message-
From: P.Taylor Goetz [mailto:ptgo...@gmail.com]
Sent: 28 August 2014 21:35
To: d...@storm.incubator.apache.org
Cc: user@storm.incubator.apache.org
Subject: [DISCUSS] Apache
When a batch times out, what happens to all the current in-flight tuples when
the batch is replayed? Are they removed from the executor queues, or are they
left in the queues, so they might be received by the executor as part of the
replayed batch/next batch, if the executor is running behind?
BTW, I'm referring to trident batches.
From: Simon Cooper [mailto:simon.coo...@featurespace.co.uk]
Sent: 19 August 2014 15:49
To: user@storm.incubator.apache.org
Subject: What happens on a batch timeout?
When a batch times out, what happens to all the current in-flight tuples when
the batch
Was that machine also running storm nimbus? Nimbus is currently a sort-of SPOF,
but it’s not really; if nimbus goes down the supervisors continue running the
topologies as before. There is currently an issue to implement a
high-availability nimbus at
#10 – 5 pts
From: Rafik Naccache [mailto:rafik.nacca...@gmail.com]
Sent: 10 June 2014 08:13
To: user@storm.incubator.apache.org
Subject: Re: [VOTE] Storm Logo Contest - Final Round
#10 3 pts
#9 2 pts
Le 9 juin 2014 à 22:17, Adam Lewis
m...@adamlewis.commailto:m...@adamlewis.com a écrit :
#10 -
This has already been reported: https://issues.apache.org/jira/browse/STORM-307.
The workaround we’ve implemented is, in our storm init scripts, we always
delete the {storm.home}/supervisor and {storm.home}/workers directories before
starting the supervisor.
From: Saurabh Agarwal (BLOOMBERG/
#10: 3pts
#6: 2pts
From: jose farfan [mailto:josef...@gmail.com]
Sent: 21 May 2014 11:38
To: user@storm.incubator.apache.org
Subject: Re: [VOTE] Storm Logo Contest - Round 1
#6 - 5 pts
On Thu, May 15, 2014 at 6:28 PM, P. Taylor Goetz
ptgo...@gmail.commailto:ptgo...@gmail.com wrote:
This is a
EvenScheduler doesn't seem to do the trick. I've set the scheduler to be
EvenScheduler, but the two topologies we've got are still being assigned to the
same supervisor, when there's 3 possible supervisors to assign to. It's hard to
tell what exactly EvenScheduler is doing, is there some
I've got a very very strange problem with one of my topologies. We've tried
deploying to a clustered environment, and the trident topology we've got isn't
running the Emitter when the Coordinator returns true from isReady(). At all.
The logging message right at the start of the method is not
We use kafka for inter-topology communication. Going from input - kafka -
storm - kafka - output takes around 20-40ms. Although you need to use kafka
0.8 or greater, as previous versions force an fsync on every message, which
makes performance plummet.
The main reasons we went for kafka rather
I've got the same error. In a running cluster, I kill the supervisor running on
one of the machines, wait until storm reassigns the topology that was on that
machine (called Sync), and then bring the supervisor up again. It immediately
dies, with the following in the log:
2014-03-27 10:50:12
Using normal storm, any bolt can output to anything at any time, as each bolt
runs arbitrary code. So a bolt in the middle of a topology can write to a
database, or file, or anything else you need. It will likely be the last bolt
in the topology, but it doesn't have to be.
If you use trident,
the coordinator and emitter, it should be
based on an external store like Redis or database etc.
Hope this helps!
MK
On Thu, Feb 20, 2014 at 12:05 PM, Simon Cooper
simon.coo...@featurespace.co.ukmailto:simon.coo...@featurespace.co.uk wrote:
I'm trying to implement an error-handling IPartitionedTridentSpout
I'm trying to implement an error-handling IPartitionedTridentSpout that limits
the number of retries of a batch. The problem I've got is the interaction
between the coordinator and emitter.
The spout reads from a kafka queue. If there's no messages been put on the
queue recently, then the
I'm trying to implement a bolt that needs to be notified of tuple failures (it
puts incoming tuples on a queue, and gets notified separately when to emit the
tuples on the queue. If a tuple fails in another bolt, I need to take it off
the queue). However, I've come across multiple issues in
I've been looking at the parallelism of a storm topology. In what situations is
it useful to have more than one task running on the same executor? An executor
is a thread, so if there's several tasks on that executor only one task can run
at a time. So why would you want more than one task on
, but in that case it would be
better to just work on scheduler improvements to accomplish that and have the
split tasks run as separate tasks within the same worker.
On Mon, Jan 27, 2014 at 5:14 AM, Simon Cooper
simon.coo...@featurespace.co.ukmailto:simon.coo...@featurespace.co.uk wrote:
I've
I've been looking at the trident code. If the following code is run on the same
bolt:
TridentStream input = ...
TridentStream firstSection = input.each(new Fields(...), new FirstFilter());
TridentStream secondSection = input.each(new Fields(...), new SecondFilter());
From looking at
Once you've partitioned a trident stream using broadcast (so every tuple goes
to every partition), how do you repartition the stream back down to 1
partition? Presumably doing it the naïve way would result in each tuple being
duplicated by the broadcast partition number; how would you combine
22 matches
Mail list logo