Re: Best way of scaling with a single spout

Javier Gonzalez Thu, 14 May 2015 17:06:27 -0700

Hi Jeff,

What makes us believe that is that, when using a spouts-only topology (we
needed to coordinate transactions between three systems and ensure exactly
once semantics) we had better performance, even when feeding from a slower
input. When we added bolts, performance degraded as stuff has to be shipped
instead of being immediately processed. Also, batching messages (ie passing
tuples that contained lists of elements to be processed) improved
performance as there is less network coordination chatter.


Regards,
Javier
On May 11, 2015 11:11 AM, "Jeff Maass" <jma...@cccis.com> wrote:

>  I think the answer to your question hinges off of this statement:
> “
> I believe the farming out of the processing to different nodes is hurting
> our performance.
> "
> What makes you believe this?
>
>
>
>
>   From: Javier Gonzalez <jagon...@gmail.com>
> Reply-To: "user@storm.apache.org" <user@storm.apache.org>
> Date: 2015,Saturday, May 9 at 17:07
> To: "user@storm.apache.org" <user@storm.apache.org>
> Subject: Re: Best way of scaling with a single spout
>
>    Hi Supun,
>
> Thank you for your response. Actually I can't use Kafka, but I believe
> there is a way to achieve what you suggest with AMPS.
>
>  Regards,
>  JG
>
> On Sat, May 9, 2015 at 5:15 PM, Supun Kamburugamuva <supu...@gmail.com>
> wrote:
>
>> You can use Kafka. You can partition your topic using a key and this will
>> give you the capability to use multiple spouts to read from the same topic.
>>
>>  Supun..
>>
>> On Sat, May 9, 2015 at 4:57 PM, Javier Gonzalez <jagon...@gmail.com>
>> wrote:
>>
>>>   Hi,
>>>
>>>  I'm currently approaching the design of an application that will have a
>>> single source of data from AMPS (high speed pub-sub system like Kafka). We
>>> are currently facing the issue that the spout is much faster than the
>>> bolts, and I believe the farming out of the processing to different nodes
>>> is hurting our performance. Before we used to have several consumers on a
>>> queue-like producer, so each spout would likely transfer to the "nearest"
>>> bolts, but now with the pub-sub model we can't just consume blindly off the
>>> source or we would face duplication.
>>>
>>>  Any ideas on how to approach this? One idea we're toying with is using
>>> more than one consumer, but using filters so that we can assure there is no
>>> duplicate reads. Any others any of you could have, I would be grateful :)
>>>
>>>  best regards,
>>>
>>> --
>>> Javier González Nicolini
>>>
>>
>>
>>
>>   --
>> Supun Kamburugamuva
>> Member, Apache Software Foundation; http://www.apache.org
>> E-mail: supu...@gmail.com;  Mobile: +1 812 369 6762
>> Blog: http://supunk.blogspot.com
>>
>>
>
>
> --
> Javier González Nicolini
>

Re: Best way of scaling with a single spout

Reply via email to