curtis, how do you set the storm.message.timeout.secs? 2015-03-10 17:07 GMT+02:00 Curtis Allen <curtis.n.al...@gmail.com>:
> Tuning an topology that contains bolts that have a unpredictable execute > latency is extremely difficult. I've had to slow down the entire topology > by increasing the storm.max.spout.pending and storm.message.timeout.secs > otherwise you'll have tuples queue up and timeout. > > On Tue, Mar 10, 2015 at 8:53 AM Martin Illecker <millec...@apache.org> > wrote: > >> I would be interested in a solution for high latency bolts as well. >> >> Maybe a custom scheduler, which prioritizes high latency bolts might help? >> (e.g., allowing a worker to exclusively run high latency bolts) >> >> Does anyone have a working solution for a high-throughput topology (x0000 >> tuples / sec) including a HTTPClient bolt (latency around 100ms)? >> >> >> 2015-03-08 20:35 GMT+01:00 Frank Jania <fja...@gmail.com>: >> >>> I've been running storm successfully now for a while with a fairly >>> simple topology of this form: >>> >>> spout with a stream of tweets --> bolt to check tweet user against cache >>> --> bolts to do some persistence based on tweet content. >>> >>> So far that's been humming along quite well with execute latencies in >>> low single digit or sub millisecond. Other than setting the parallelism for >>> various bolts, I've been able to run it the default topology config pretty >>> well. >>> >>> Now I'm trying a topology of the form: >>> >>> spout with a stream of tweets --> bolt to extract the urls in the tweet >>> --> bolt to fetch the url and get the page's title. >>> >>> For this topology the "fetch" portion can have a much longer latency, >>> I'm seeing execute latencies in the 300-500ms range to accommodate the >>> fetch of any of these arbitrary urls. I've implemented caching to avoid >>> fetching urls I already have titles for and using socket/connection >>> timeouts to keep fetches from hanging for too long, but even still, this is >>> going to be a bottleneck. >>> >>> I've set the parallelism for the fetch bolt fairly high already, but are >>> there any best practices for configuring a topology like this where at >>> least one bolt is going to take much more time to process than the rest? >>> >> >>