Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-09-08 Thread 鈴木俊裕
Hi, We upgraded our cluster from CDH5.3.1(HBase0.98.6) to CDH5.4.5(HBase1.0.0) and we experience slowdown in increment operation. Here's an extract from thread dump of the RegionServer of our cluster: Thread 68 (RW.default.writeRpcServer.handler=15,queue=5,port=60020): State: BLOCKED Blocked

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-09-08 Thread Ted Yu
In HRegion#increment(), we lock the row (not region): try { rowLock = getRowLock(row); Can you pastebin the complete stack trace ? Thanks On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 wrote: > Hi, > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) to CDH5.4.5(HBase1.0.0) > and we experie

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-09-08 Thread 鈴木俊裕
Ted, Thank you for your response. I uploaded the complete stack trace to Gist. https://gist.github.com/brfrn169/cb4f2c157129330cd932 I think that increment operation works as follows: 1. get row lock 2. mvcc.waitForPreviousTransactionsComplete() // wait for all prior MVCC transactions to fini

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-09-08 Thread Stack
On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 wrote: > Ted, > > Thank you for your response. > > I uploaded the complete stack trace to Gist. > > https://gist.github.com/brfrn169/cb4f2c157129330cd932 > > > I think that increment operation works as follows: > > 1. get row lock > 2. mvcc.waitForPreviousTra

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-09-12 Thread 鈴木俊裕
St.Ack, Thank you for your response. Why I make out that "A region lock (not a row lock) seems to occur in waitForPreviousTransactionsComplete()" is as follows: A increment operation has 3 procedures for MVCC. 1. mvcc.waitForPreviousTransactionsComplete(); https://github.com/cloudera/hbase/blo

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-09-12 Thread Stack
Thank you for the below reasoning (with accompanying helpful diagram). Makes sense. Let me hack up a test case to help with the illustration. It is as though the mvcc should be scoped to a row only... Writes against other rows should not hold up my read of my row. Tag an mvcc with a 'row' scope so

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-09-14 Thread 鈴木俊裕
> Thank you for the below reasoning (with accompanying helpful diagram). > Makes sense. Let me hack up a test case to help with the illustration. It > is as though the mvcc should be scoped to a row only... Writes against > other rows should not hold up my read of my row. Tag an mvcc with a 'row' >

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-09-21 Thread Stack
Back to this problem. Simple tests confirm that as is, the single-queue-backed MVCC instance can slow Region ops if some other row is slow to complete. In particular Increment, checkAndPut, and batch mutations are effected. I opened HBASE-14460 to start in on a fix up. Lets see if we can somehow sc

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-09-21 Thread Elliott Clark
Commented up on the jira. But I think there's a pretty easy solution that we can do that should be possible in the near future. We will continue to have issues in situations that are highly contended on just a small number of rows. But there's not a whole lot that I can see to make that situation m

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-09-24 Thread 鈴木俊裕
Thank you St.Ack! I would like to follow the ticket. Toshihiro Suzuki 2015-09-22 14:14 GMT+09:00 Stack : > Back to this problem. Simple tests confirm that as is, the > single-queue-backed MVCC instance can slow Region ops if some other row is > slow to complete. In particular Increment, checkAn

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Is there any update to this? We just upgraded all of our production clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the known issues, did not not about this. Now we are seeing perfomance issues across all clusters, as we make heavy use of increments. Can we roll forward to CDH5

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
Rollback is untested. No fix in 5.5. I was going to work on this now. Where are your counters Bryan? In their own column family or scattered about in a row with other Cell types? St.Ack On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault < bbeaudrea...@hubspot.com> wrote: > Is there any update to

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Yea, they are all over the place and called from client and coprocessor code. We ended up having no other option but to rollback, and aside from a few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn), it seems to be working and fixing our problem. On Mon, Nov 30, 2015 at 3:47 PM St

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Should this be added as a known issue in the CDH or hbase documentation? It was a severe performance hit for us, all of our regionservers were sitting at a few thousand queued requests. On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault wrote: > Yea, they are all over the place and called from cl

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault < bbeaudrea...@hubspot.com> wrote: > Should this be added as a known issue in the CDH or hbase documentation? It > was a severe performance hit for us, all of our regionservers were sitting > at a few thousand queued requests. > > Let me take car

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
The rollback seems to have mostly solved the issue for one of our clusters, but another one is still seeing long increment times: "slowIncrementCount": 52080, "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162," Increment_mean": 465.68678129112396,"Increment_median": 216," Increm

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault wrote: > The rollback seems to have mostly solved the issue for one of our clusters, > but another one is still seeing long increment times: > > "slowIncrementCount": 52080, > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
https://gist.github.com/bbeaudreault/2994a748da83d9f75085 An active handler: https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286 One that is locked: https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579 The difference between pre-rollbac

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Sorry the second link should be https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579 On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault wrote: > https://gist.github.com/bbeaudreault/2994a748da83d9f75085 > > An active handler: > https://gist.github.com/bbeaudreault/299

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
We've been doing more debugging of this and have set up the read vs write handlers to try to at least segment this away so reads can work. We have pretty beefy servers, and are running wiht the following settings: hbase.regionserver.handler.count=1000 hbase.ipc.server.read.threadpool.size=50 hbase

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Those log lines have settled down, they may have been related to a cluster-wide forced restart at the time. On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault wrote: > We've been doing more debugging of this and have set up the read vs write > handlers to try to at least segment this away so read

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
Still slow increments though? On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault wrote: > Those log lines have settled down, they may have been related to a > cluster-wide forced restart at the time. > > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault < > bbeaudrea...@hubspot.com> > wrote: > >

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Yea sorry if I was misleading. The nonce loglines we saw only happened on full cluster restart, it may have been the HLog's replaying, not sure. We are still seeing slow Increments. Where Gets and Mutates will be on the order of 50-150ms according to metrics, Increment will be in the 1000-5000ms

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
Looking at that stack trace, nothing showing as blocked or slowed by another operation. You have others I could look at Bryan? St.Ack On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault wrote: > Yea sorry if I was misleading. The nonce loglines we saw only happened on > full cluster restart, it

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
I'll try to get another one. We are currently not seeing the issue due to lack of contention (it is off hours for our customers). Note that the stack trace I gave you was taken with a tool we have which aggregates common stacks. The one at the bottom occurred 122 times (out of 128 handlers -- thi

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
Looking again, the https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359 thread dump and the https://gist.github.com/bbeaudreault/2994a748da83d9f75085 thread dump are the same? Only have two increments going on in this thread dump: at org.apache.hadoop.hbase.KeyValue.ma

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
I didn't think to use the non-aggregated jstack outout as it has become second nature for us to use https://github.com/HubSpot/astack/. It rolls up repeating stacktraces. You can see above each stacktrace the number of times it occurred and an estimated cpu time spent. Sorry will try to get it wi

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
On Mon, Nov 30, 2015 at 9:16 PM, Bryan Beaudreault wrote: > I'll try to get another one. We are currently not seeing the issue due to > lack of contention (it is off hours for our customers). > > Note that the stack trace I gave you was taken with a tool we have which > aggregates common stacks.

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-12-13 Thread Stack
On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 wrote: > ... > > Also we wrote performance test code for increment operation that included > 100 threads and ran it in local mode. > > The result is shown below: > > CDH5.3.1(HBase0.98.6) > Throughput(op/s): 12757, Latency(ms): 7.975072509210629 > > CDH5.4.5(H

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-12-21 Thread 鈴木俊裕
St.Ack I am sorry for the late reply. This is the test code: https://github.com/brfrn169/hbase-test We applied the patch you can find below to HBase-1.0.0 to resolve the performance degradation: https://gist.github.com/brfrn169/15a874594be2fb9d6ea0 It showed a good performance. I think the dir

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-12-21 Thread Stack
On Mon, Dec 21, 2015 at 2:31 AM, 鈴木俊裕 wrote: > St.Ack > > I am sorry for the late reply. > > Thank you for the reply. > This is the test code: > https://github.com/brfrn169/hbase-test This helps. The test has a different character to others we currently have (a thread keeps writing its row r

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2016-02-08 Thread Stack
Let me close out this thread. Below is the release note from the HBASE-14460 umbrella increments regression issue and then some. Increments, appends, checkAnd* have been slow since hbase-.1.0.0. The unification of mvcc and sequence id done by HBASE-8763 was responsible. A ‘fast-path’ workaround

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2016-02-08 Thread Bryan Beaudreault
This is great news! Thanks for all of the hard work here, we're excited to put this issue behind us and are happy to see the lesson around improving perf testing. Cheers, Bryan On Mon, Feb 8, 2016 at 1:55 PM Stack wrote: > Let me close out this thread. > > Below is the release note from the HB