Just as a side note, I've seen capacity numbers >6. The calculation for capacity is somewhat flawed, and does not represent a true percentage capacity, merely a relative measure next to your other bolts.
Maybe that's something we can improve, I'll log a JIRA if there isn't already one. Michael Rose (@Xorlev <https://twitter.com/xorlev>) Senior Platform Engineer, FullContact <http://www.fullcontact.com/> mich...@fullcontact.com On Mon, Jun 9, 2014 at 6:58 PM, Jon Logan <jmlo...@buffalo.edu> wrote: > Are you sure you are looking at the right figure? Capacity should not be > > 1. High values indicate that you may want to increase parallelism on that > step. Low values indicate something else is probably bottlenecking your > topology. If you could send a screenshot of the Storm UI that could be > helpful. > > > I've had good luck with YourKit...just remotely attach to a running worker. > > > On Mon, Jun 9, 2014 at 8:53 PM, Justin Workman <justinjwork...@gmail.com> > wrote: > >> The capacity indicates they are being utilized. Capacity hovers around >> .800 and busts to 1.6 or so when we see spikes of tuples or restart the >> topology. >> >> Recommendations on profilers? >> >> Sent from my iPhone >> >> On Jun 9, 2014, at 6:50 PM, Jon Logan <jmlo...@buffalo.edu> wrote: >> >> Are your HBase bolts being saturated? If not, you may want to increase >> the number of pending tuples, as that could cause things to be artificially >> throttled. >> >> You should also try attaching a profiler to your bolt, and see what's >> holding it up. Are you doing batched puts (or puts being committed on a >> separate thread)? That could also cause substantial improvements. >> >> >> On Mon, Jun 9, 2014 at 8:11 PM, Justin Workman <justinjwork...@gmail.com> >> wrote: >> >>> In response to a comment from P. Taylor Goetz on another thread..."I >>> can personally verify that it is possible to process 1.2+ million >>> (relatively small) messages per second with a 10-15 node cluster — and that >>> includes writing to HBase, and other components (I don’t have the hardware >>> specs handy, but can probably dig them up)." >>> >>> I would like to know what special knobs people are tuning in both Storm >>> and Hbase to achieve this level of throughput. Things I would be interested >>> in would be Hbase cluster sizes, is the cluster shared with map reduce load >>> as well, bolt parallelism and any other knobs people have adjusted to get >>> this level of write throughput to Hbase from Storm. >>> >>> Maybe this isn't the right group, but we are struggling getting more >>> than about 2000 tuples/sec writting to Hbase. I think I know some of the >>> bottlenecks, but would love to know what others in teh community are tuning >>> to get this level of performance. >>> >>> Our messages are roughly 300-500k and we are running on a 6 node Storm >>> cluster running on virtual machines (our first bottleneck, which we will be >>> replacing with 10 relatively beefy physical nodes), a parallelism of 40 for >>> our storage bolt. >>> >>> Any hints on Hbase or Storm optimizations that can be done to help >>> increase the throughput to Hbase would be greatly appreciated. >>> >>> Thanks >>> Justin >>> >> >> >