I'm considering a situation in which a stream is split in two, as below. At the 
end of each stream is a statePersist that puts the tuple onto an external 
queue, and each statePersist output has very low-latency requirements wrt the 
input. If there's no changes in partitioning or parallelism, trident will put 
both streams onto the same bolt, which will delay the statePersist on the 
second stream until the first one is complete. Running both streams in separate 
threads will fix this.

I don't understand your point about synchronization around OutputCollector; the 
wiki Concepts page says OutputCollector is thread safe? Some locking on stream 
joins would be required, but that's easy to do by protecting calls to the 
MultiReducer with a lock in MultiReducerProcessor.

Unless there's some way of forcing trident to start a new bolt without changing 
the partitioning or parallelism that I've missed?

SimonC

From: nathan.m...@gmail.com [mailto:nathan.m...@gmail.com] On Behalf Of Nathan 
Marz
Sent: 27 January 2014 20:11
To: user@storm.incubator.apache.org
Subject: Re: Threading and partitioning on trident bolts

Correct, they will be run sequentially within the bolt. Threading within a bolt 
adds a ton of complication (needing to synchronize around output collector, for 
instance), so that's really bad. What are you trying to achieve by adding more 
threading? You won't get better resource usage because the rest of the topology 
is parallelized. You might get better latency, but in that case it would be 
better to just work on scheduler improvements to accomplish that and have the 
"split" tasks run as separate tasks within the same worker.

On Mon, Jan 27, 2014 at 5:14 AM, Simon Cooper 
<simon.coo...@featurespace.co.uk<mailto:simon.coo...@featurespace.co.uk>> wrote:
I've been looking at the trident code. If the following code is run on the same 
bolt:

TridentStream input = ...

TridentStream firstSection = input.each(new Fields(...), new FirstFilter());
TridentStream secondSection = input.each(new Fields(...), new SecondFilter());

>From looking at FreshCollector and AppendCollector, it looks like the two 
>filters will be run sequentially. Are there plans/thoughts on involving 
>threading when a stream splits like this? Running each split on a new thread 
>in parallel? There may be some complications when joining streams together, 
>but that's a simple solution that a couple of locks on the 
>MultiReducerProcessor would solve. Has this been considered before?

SimonC



--
Twitter: @nathanmarz
http://nathanmarz.com<http://nathanmarz.com/>

Reply via email to