Re: Suggestion for topology with high latency bolt

Idan Fridman Tue, 10 Mar 2015 08:51:40 -0700

curtis, how do you set the storm.message.timeout.secs?

2015-03-10 17:07 GMT+02:00 Curtis Allen <curtis.n.al...@gmail.com>:


> Tuning an topology that contains bolts that have a unpredictable execute
> latency is extremely difficult. I've had to slow down the entire topology
> by increasing the storm.max.spout.pending and storm.message.timeout.secs
> otherwise you'll have tuples queue up and timeout.
>
> On Tue, Mar 10, 2015 at 8:53 AM Martin Illecker <millec...@apache.org>
> wrote:
>
>> I would be interested in a solution for high latency bolts as well.
>>
>> Maybe a custom scheduler, which prioritizes high latency bolts might help?
>> (e.g., allowing a worker to exclusively run high latency bolts)
>>
>> Does anyone have a working solution for a high-throughput topology (x0000
>> tuples / sec) including a HTTPClient bolt (latency around 100ms)?
>>
>>
>> 2015-03-08 20:35 GMT+01:00 Frank Jania <fja...@gmail.com>:
>>
>>> I've been running storm successfully now for a while with a fairly
>>> simple topology of this form:
>>>
>>> spout with a stream of tweets --> bolt to check tweet user against cache
>>> --> bolts to do some persistence based on tweet content.
>>>
>>> So far that's been humming along quite well with execute latencies in
>>> low single digit or sub millisecond. Other than setting the parallelism for
>>> various bolts, I've been able to run it the default topology config pretty
>>> well.
>>>
>>> Now I'm trying a topology of the form:
>>>
>>> spout with a stream of tweets --> bolt to extract the urls in the tweet
>>> --> bolt to fetch the url and get the page's title.
>>>
>>> For this topology the "fetch" portion can have a much longer latency,
>>> I'm seeing execute latencies in the 300-500ms range to accommodate the
>>> fetch of any of these arbitrary urls. I've implemented caching to avoid
>>> fetching urls I already have titles for and using socket/connection
>>> timeouts to keep fetches from hanging for too long, but even still, this is
>>> going to be a bottleneck.
>>>
>>> I've set the parallelism for the fetch bolt fairly high already, but are
>>> there any best practices for configuring a topology like this where at
>>> least one bolt is going to take much more time to process than the rest?
>>>
>>
>>

Re: Suggestion for topology with high latency bolt

Reply via email to