RE: High CPU usage by storm workers

2014-09-25 Thread Simon Cooper
We’ve found the same thing. A significant amount of cpu time is being taken up by unlocking (?) within the disruptor queue. From: Rahul Mittal [mailto:rahulmitta...@gmail.com] Sent: 24 September 2014 16:16 To: user@storm.incubator.apache.org Subject: High CPU usage by storm workers Hi We are

RE: What happens on a batch timeout?

2014-09-25 Thread Simon Cooper
to use the batch id to filter tuples doesn't seem to work. Unfortunately, I can't understand the behaviour without some input from someone who knows how trident works and can match this behaviour onto what trident is *meant* to do on a batch replay. SimonC From: Simon Cooper [mailto:simon.coo

Storm spending a long time waiting for DisruptorQueue.publish

2014-09-03 Thread Simon Cooper
I'm trying to increase the throughput of one of our trident topologies. Running a performance profile on the worker, it's spending a lot of time waiting on a ReentrantLock via executor$mk_executor_transfer_fn and DisruptorQueue.publish. My question is, why is it waiting? It's trying to send

RE: [DISCUSS] Apache Storm Release 0.9.3/0.10.0

2014-08-29 Thread Simon Cooper
May I request that the patch for STORM-341 is merged into trunk for the 0.9.3 release? SimonC -Original Message- From: P.Taylor Goetz [mailto:ptgo...@gmail.com] Sent: 28 August 2014 21:35 To: d...@storm.incubator.apache.org Cc: user@storm.incubator.apache.org Subject: [DISCUSS] Apache

What happens on a batch timeout?

2014-08-19 Thread Simon Cooper
When a batch times out, what happens to all the current in-flight tuples when the batch is replayed? Are they removed from the executor queues, or are they left in the queues, so they might be received by the executor as part of the replayed batch/next batch, if the executor is running behind?

RE: What happens on a batch timeout?

2014-08-19 Thread Simon Cooper
BTW, I'm referring to trident batches. From: Simon Cooper [mailto:simon.coo...@featurespace.co.uk] Sent: 19 August 2014 15:49 To: user@storm.incubator.apache.org Subject: What happens on a batch timeout? When a batch times out, what happens to all the current in-flight tuples when the batch

RE: storm crashes when one of the zookeeper server dies

2014-07-04 Thread Simon Cooper
Was that machine also running storm nimbus? Nimbus is currently a sort-of SPOF, but it’s not really; if nimbus goes down the supervisors continue running the topologies as before. There is currently an issue to implement a high-availability nimbus at

RE: [VOTE] Storm Logo Contest - Final Round

2014-06-10 Thread Simon Cooper
#10 – 5 pts From: Rafik Naccache [mailto:rafik.nacca...@gmail.com] Sent: 10 June 2014 08:13 To: user@storm.incubator.apache.org Subject: Re: [VOTE] Storm Logo Contest - Final Round #10 3 pts #9 2 pts Le 9 juin 2014 à 22:17, Adam Lewis m...@adamlewis.commailto:m...@adamlewis.com a écrit : #10 -

RE: Recovering From Zookeeper Failure

2014-05-21 Thread Simon Cooper
This has already been reported: https://issues.apache.org/jira/browse/STORM-307. The workaround we’ve implemented is, in our storm init scripts, we always delete the {storm.home}/supervisor and {storm.home}/workers directories before starting the supervisor. From: Saurabh Agarwal (BLOOMBERG/

RE: [VOTE] Storm Logo Contest - Round 1

2014-05-21 Thread Simon Cooper
#10: 3pts #6: 2pts From: jose farfan [mailto:josef...@gmail.com] Sent: 21 May 2014 11:38 To: user@storm.incubator.apache.org Subject: Re: [VOTE] Storm Logo Contest - Round 1 #6 - 5 pts On Thu, May 15, 2014 at 6:28 PM, P. Taylor Goetz ptgo...@gmail.commailto:ptgo...@gmail.com wrote: This is a

RE: Multiple Storm components getting assigned to same worker slot despite of free slots being available

2014-05-16 Thread Simon Cooper
EvenScheduler doesn't seem to do the trick. I've set the scheduler to be EvenScheduler, but the two topologies we've got are still being assigned to the same supervisor, when there's 3 possible supervisors to assign to. It's hard to tell what exactly EvenScheduler is doing, is there some

Emitter.emitPartitionBatchNew not being called after Coordinator.isReady returns true

2014-05-12 Thread Simon Cooper
I've got a very very strange problem with one of my topologies. We've tried deploying to a clustered environment, and the trident topology we've got isn't running the Emitter when the Coordinator returns true from isReady(). At all. The logging message right at the start of the method is not

RE: Replacing DRPC With Kafka

2014-04-04 Thread Simon Cooper
We use kafka for inter-topology communication. Going from input - kafka - storm - kafka - output takes around 20-40ms. Although you need to use kafka 0.8 or greater, as previous versions force an fsync on every message, which makes performance plummet. The main reasons we went for kafka rather

RE: failed to start supervisor with missing stormconf.ser

2014-03-27 Thread Simon Cooper
I've got the same error. In a running cluster, I kill the supervisor running on one of the machines, wait until storm reassigns the topology that was on that machine (called Sync), and then bring the supervisor up again. It immediately dies, with the following in the log: 2014-03-27 10:50:12

RE: Storm Applications

2014-02-26 Thread Simon Cooper
Using normal storm, any bolt can output to anything at any time, as each bolt runs arbitrary code. So a bolt in the middle of a topology can write to a database, or file, or anything else you need. It will likely be the last bolt in the topology, but it doesn't have to be. If you use trident,

RE: Problems with a coordinator and emitter of an error-handling IPartitionedTridentSpout

2014-02-21 Thread Simon Cooper
the coordinator and emitter, it should be based on an external store like Redis or database etc. Hope this helps! MK On Thu, Feb 20, 2014 at 12:05 PM, Simon Cooper simon.coo...@featurespace.co.ukmailto:simon.coo...@featurespace.co.uk wrote: I'm trying to implement an error-handling IPartitionedTridentSpout

Problems with a coordinator and emitter of an error-handling IPartitionedTridentSpout

2014-02-20 Thread Simon Cooper
I'm trying to implement an error-handling IPartitionedTridentSpout that limits the number of retries of a batch. The problem I've got is the interaction between the coordinator and emitter. The spout reads from a kafka queue. If there's no messages been put on the queue recently, then the

Notifying a bolt of a tuple failure

2014-02-17 Thread Simon Cooper
I'm trying to implement a bolt that needs to be notified of tuple failures (it puts incoming tuples on a queue, and gets notified separately when to emit the tuples on the queue. If a tuple fails in another bolt, I need to take it off the queue). However, I've come across multiple issues in

Number of executors number of tasks?

2014-02-17 Thread Simon Cooper
I've been looking at the parallelism of a storm topology. In what situations is it useful to have more than one task running on the same executor? An executor is a thread, so if there's several tasks on that executor only one task can run at a time. So why would you want more than one task on

RE: Threading and partitioning on trident bolts

2014-01-28 Thread Simon Cooper
, but in that case it would be better to just work on scheduler improvements to accomplish that and have the split tasks run as separate tasks within the same worker. On Mon, Jan 27, 2014 at 5:14 AM, Simon Cooper simon.coo...@featurespace.co.ukmailto:simon.coo...@featurespace.co.uk wrote: I've

Threading and partitioning on trident bolts

2014-01-27 Thread Simon Cooper
I've been looking at the trident code. If the following code is run on the same bolt: TridentStream input = ... TridentStream firstSection = input.each(new Fields(...), new FirstFilter()); TridentStream secondSection = input.each(new Fields(...), new SecondFilter()); From looking at

Repartitioning a broadcast partitioned stream

2014-01-15 Thread Simon Cooper
Once you've partitioned a trident stream using broadcast (so every tuple goes to every partition), how do you repartition the stream back down to 1 partition? Presumably doing it the naïve way would result in each tuple being duplicated by the broadcast partition number; how would you combine