Just as a side note, I've seen capacity numbers >6. The calculation for
capacity is somewhat flawed, and does not represent a true percentage
capacity, merely a relative measure next to your other bolts.

Maybe that's something we can improve, I'll log a JIRA if there isn't
already one.

Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
mich...@fullcontact.com


On Mon, Jun 9, 2014 at 6:58 PM, Jon Logan <jmlo...@buffalo.edu> wrote:

> Are you sure you are looking at the right figure? Capacity should not be >
> 1. High values indicate that you may want to increase parallelism on that
> step. Low values indicate something else is probably bottlenecking your
> topology. If you could send a screenshot of the Storm UI that could be
> helpful.
>
>
> I've had good luck with YourKit...just remotely attach to a running worker.
>
>
> On Mon, Jun 9, 2014 at 8:53 PM, Justin Workman <justinjwork...@gmail.com>
> wrote:
>
>> The capacity indicates they are being utilized. Capacity hovers around
>> .800 and busts to 1.6 or so when we see spikes of tuples or restart the
>> topology.
>>
>> Recommendations on profilers?
>>
>> Sent from my iPhone
>>
>> On Jun 9, 2014, at 6:50 PM, Jon Logan <jmlo...@buffalo.edu> wrote:
>>
>> Are your HBase bolts being saturated? If not, you may want to increase
>> the number of pending tuples, as that could cause things to be artificially
>> throttled.
>>
>> You should also try attaching a profiler to your bolt, and see what's
>> holding it up. Are you doing batched puts (or puts being committed on a
>> separate thread)? That could also cause substantial improvements.
>>
>>
>> On Mon, Jun 9, 2014 at 8:11 PM, Justin Workman <justinjwork...@gmail.com>
>> wrote:
>>
>>> In response to a comment from P. Taylor Goetz on another thread..."I
>>> can personally verify that it is possible to process 1.2+ million
>>> (relatively small) messages per second with a 10-15 node cluster — and that
>>> includes writing to HBase, and other components (I don’t have the hardware
>>> specs handy, but can probably dig them up)."
>>>
>>> I would like to know what special knobs people are tuning in both Storm
>>> and Hbase to achieve this level of throughput. Things I would be interested
>>> in would be Hbase cluster sizes, is the cluster shared with map reduce load
>>> as well, bolt parallelism and any other knobs people have adjusted to get
>>> this level of write throughput to Hbase from Storm.
>>>
>>> Maybe this isn't the right group, but we are struggling getting more
>>> than about 2000 tuples/sec writting to Hbase. I think I know some of the
>>> bottlenecks, but would love to know what others in teh community are tuning
>>> to get this level of performance.
>>>
>>> Our messages are roughly 300-500k and we are running on a 6 node Storm
>>> cluster running on virtual machines (our first bottleneck, which we will be
>>> replacing with 10 relatively beefy physical nodes), a parallelism of 40 for
>>> our storage bolt.
>>>
>>> Any hints on Hbase or Storm optimizations that can be done to help
>>> increase the throughput to Hbase would be greatly appreciated.
>>>
>>> Thanks
>>> Justin
>>>
>>
>>
>

Reply via email to