Re: Storm acking mechanism, proposal to improvement.

Nathan Leung Thu, 04 Dec 2014 07:12:02 -0800

As an example, if you have 10 kafka partitions, and 10 spout tasks, and >
10 downstream bolts split across > 10 workers.  If you use shuffleGrouping
then the work will be evenly distributed.  If you use localOrShuffle some
of your bolt tasks will be starved.


On Thu, Dec 4, 2014 at 9:46 AM, Vladi Feigin <vladi...@gmail.com> wrote:

> Hi,
>
> What would you recommend in our situation then? Packing together all
> operations into a single bolt,probably will not help in our case : our
> first bolt receives 400K tuples per/sec , most of them the bolt filters out
> but for each filtered tuple an ack is sent anyway.
>
> Other question : When it's preferable to use shuffle over localOrShuffle?
> I was thinking that localOrShuffle is always better since it reduces the
> network's cost.
> Is there a scenario when shuffle is preferable?
> Thank you in advance,
> Vladi
>
>
>
> On Tue, Dec 2, 2014 at 6:45 AM, Nathan Marz <nat...@nathanmarz.com> wrote:
>
>> Acking is 1 message per tuple, which is at most a 50% throughput drop.
>> However, messages sent for acking are quite small so it is not likely to be
>> nearly that much of a drop. In addition, the acker bolt is highly efficient
>> and uses very little CPU.
>>
>> Local acking does not really make sense. If you're in a situation where
>> you would benefit from "local acking", that means you have a lot of bolts
>> strung together with "localOrShuffle" – in which case you should try
>> packing together those operations into a single bolt (as Trident does).
>>
>> On Mon, Dec 1, 2014 at 2:41 PM, Vladi Feigin <vladi...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> We use Storm 0.82. Our throughput is 400K messages per sec.
>>> From Storm UI we calculate that the total latency of all spouts and
>>> bolts = ~3.5 min to process 10 min of data. But in reality it takes 13 min!
>>> Obviously it creates huge backlogs.
>>> We don't have time-out failures at all. We don't other exceptions.
>>> So our main suspect is Storm acking mechanism , which uses a lot of
>>> network.
>>> (BTW if you have other opinion , please let me know)
>>> We think the fact that the all ack messages go via 0mq ,even when acker
>>> bolt runs in the same worker, causes this huge performance drop. An ack is
>>> sent per tuple (micro-batches are not supported), which is inefficient.
>>> There is no a way as far as we know to define the acker bolt to work in
>>> local Shuffle (like it's possible for other bolts)
>>> We'd like to ask your opinion regarding the new proposed feature in
>>> Storm:
>>> Support local acking . That means if an acker runs locally in the same
>>> worker , send the ack messages via local Disraptor queue (like
>>> localShuffle) rather than via 0mq.
>>>
>>> Does it make sense? What do you think?
>>>
>>> If you think that a root cause of our problem is other one, please let
>>> us know.
>>> Thank you in advance,
>>> Vladi
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> .
>>>
>>>
>>>
>>
>>
>> --
>> Twitter: @nathanmarz
>> http://nathanmarz.com
>>
>
>

Re: Storm acking mechanism, proposal to improvement.

Reply via email to