We were having the exact same problem when we were doing our own load testing with hbase. We found that a memstore would reach its hbase.hstore.blockingStoreFiles limit or its hbase.hregion.memstore.block.multiplier. Hitting either of those limits prevents writes to the specific region and the client would have to pause until a compaction could come through and clean stuff up.

However the biggest problem is that there would be a descent size compaction queue, we'd hit one of those limits, and then get put on the *back* of the queue and would have to wait *minutes* before it finally got to do the compaction we needed to stop the blocking. I created a jira to address the issue HBASE-2646. There is a patch in the jira for 0.20.4 that creates a priority compaction queue that greatly helped our problem. In fact we saw little to no pausing after applying the patch. In the comments of the jira you can see some of the settings we used to prevent the problem without the patch.

Apparently there is some work going on to do concurrent priority compaction (Jonathan Gray has been working on it) but I haven't seen anything yet in hbase and don't know the time line. My personal opinion is that we should integrate the patch into trunk and use it until the more advanced compactions are implemented.

~Jeff

On 9/10/2010 2:27 AM, Jeff Hammerbacher wrote:
We've been brainstorming some ideas to "smooth out" these performance
lapses, so instead of getting a 10 second period of unavailability, you get
a 30 second period of slower performance, which is usually preferable.

Where is this brainstorming taking place? Could we open a JIRA issue to
capture the brainstorming in public and searchable fashion?


--
Jeff Whiting
Qualtrics Senior Software Engineer
je...@qualtrics.com

Reply via email to