It turned out to be the kestrel queues, we had misread some basic test numbers. The number of queues had to be scaled out.
On Fri, Jul 11, 2014 at 10:20 PM, Andrew Montalenti <and...@parsely.com> wrote: > What's limiting your throughput? Your e-mail doesn't have enough > information to make a diagnosis. > > Whether 4k or 6k processed messages per second is "fast" depends on a lot > of factors -- average message size, parallelism, hardware, batching > approach, etc. > > P. Taylor Goetz has a nice slide presentation discussing various factors > to think about when scaling Storm topologies for throughput: > > http://www.slideshare.net/ptgoetz/scaling-storm-hadoop-summit-2014 > > One trick I tend to use to identify throughput bottlenecks is to lay out a > topology with mock bolts that do nothing but "pass tuples through", > configured identically from a partitioning / paralellism standpoint to my > actual topology. Then see how much throughput I get simply piping tuples > from the spout through that mock topology. This can often help you find > issues with things like performance bugs originating at the spout, > acking/emitting bugs, or other similar problems. It can also let you remove > some components from your topology to performance test them in isolation. > > You can also review this recent JIRA ticket about improvements to the > Netty transport. Not only is this a lot of engineering effort going into > Storm's performance at scale, but benchmarks listed in there show > throughput levels of several hundred thousand messages per second, > saturating cores and network on topology machines. > > https://issues.apache.org/jira/browse/STORM-297 > > Please don't roll your own stream processor -- the world doesn't need > another. :-D Something is likely wrong with the topology's layout and I'm > sure it's fixable. > > HTH, > > --- > Andrew Montalenti > Co-Founder & CTO > http://parse.ly > > > > On Fri, Jul 11, 2014 at 6:38 PM, Gary Malouf <malouf.g...@gmail.com> > wrote: > >> Hi everyone, >> >> We've been banging our heads against the wall trying to get reasonable >> performance out of a small storm cluster. >> >> Setup after stripping down trying to debug: >> >> - All servers on EC2 m3.larges >> - 2 Kestrel 2.4.1 queue servers >> - 3 Storm Servers (1 running ui + nimbus, all running supervisors and >> thus workers) >> - 2 workers per instance, workers get 2GB of RAM max >> - 1 topology with 2 KestrelSpouts >> >> We measure performance by doing the following: >> >> - loading up the queues with a couple million items in each >> - deploying the topology >> - pulling up the storm ui and tracking the changes in ack counts over >> time on the spouts to compute average throughputs >> >> >> With acking enabled on our spouts we were getting around 4k >> messages/second >> With acking disabled on our spouts, we were seeing around 6k >> messages/second >> >> >> Adding a few bolts with acking quickly bring performance down below 800 >> messages/second - pretty dreadful. Based on the reports many other people >> have posted about their Storm clusters, I find these numbers really >> disappointing. We've tried tuning the worker jvm options, number of >> workers/executors with this simple setup but could not squeeze anything >> more out. >> >> Does anyone have any further suggestions about where we should be >> looking? We are about set to pull storm out of production and roll our own >> processor. >> >> Thanks, >> >> Gary >> > >