On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault < bbeaudrea...@hubspot.com> wrote:
> Should this be added as a known issue in the CDH or hbase documentation? It > was a severe performance hit for us, all of our regionservers were sitting > at a few thousand queued requests. > > Let me take care of that. St.Ack > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault < > bbeaudrea...@hubspot.com> > wrote: > > > Yea, they are all over the place and called from client and coprocessor > > code. We ended up having no other option but to rollback, and aside from > a > > few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn), it > > seems to be working and fixing our problem. > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote: > > > >> Rollback is untested. No fix in 5.5. I was going to work on this now. > >> Where > >> are your counters Bryan? In their own column family or scattered about > in > >> a > >> row with other Cell types? > >> St.Ack > >> > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault < > >> bbeaudrea...@hubspot.com> wrote: > >> > >> > Is there any update to this? We just upgraded all of our production > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the > >> > known issues, did not not about this. Now we are seeing perfomance > >> issues > >> > across all clusters, as we make heavy use of increments. > >> > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to roll back > >> to > >> > CDH 5.3.1 (if that is possible)? > >> > > >> > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <brfrn...@gmail.com> wrote: > >> > > >> > > Thank you St.Ack! > >> > > > >> > > I would like to follow the ticket. > >> > > > >> > > Toshihiro Suzuki > >> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>: > >> > > > >> > > > Back to this problem. Simple tests confirm that as is, the > >> > > > single-queue-backed MVCC instance can slow Region ops if some > other > >> row > >> > > is > >> > > > slow to complete. In particular Increment, checkAndPut, and batch > >> > > mutations > >> > > > are effected. I opened HBASE-14460 to start in on a fix up. Lets > >> see if > >> > > we > >> > > > can somehow scope mvcc to row or at least shard mvcc so not all > >> Region > >> > > ops > >> > > > are paused. > >> > > > > >> > > > St.Ack > >> > > > > >> > > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <brfrn...@gmail.com> wrote: > >> > > > > >> > > > > > Thank you for the below reasoning (with accompanying helpful > >> > > diagram). > >> > > > > > Makes sense. Let me hack up a test case to help with the > >> > > illustration. > >> > > > It > >> > > > > > is as though the mvcc should be scoped to a row only... Writes > >> > > against > >> > > > > > other rows should not hold up my read of my row. Tag an mvcc > >> with a > >> > > > 'row' > >> > > > > > scope so we can see which on-going writes pertain to current > >> > > operation? > >> > > > > Thank you St.Ack! I think this approach would work. > >> > > > > > >> > > > > > You need to read back the increment and have it be 'correct' > at > >> > > > increment > >> > > > > > time? > >> > > > > Yes, we need it. > >> > > > > > >> > > > > I would like to help if there is anything I can do. > >> > > > > > >> > > > > Thanks, > >> > > > > Toshihiro Suzuki > >> > > > > > >> > > > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>: > >> > > > > > >> > > > > > Thank you for the below reasoning (with accompanying helpful > >> > > diagram). > >> > > > > > Makes sense. Let me hack up a test case to help with the > >> > > illustration. > >> > > > It > >> > > > > > is as though the mvcc should be scoped to a row only... Writes > >> > > against > >> > > > > > other rows should not hold up my read of my row. Tag an mvcc > >> with a > >> > > > 'row' > >> > > > > > scope so we can see which on-going writes pertain to current > >> > > operation? > >> > > > > > > >> > > > > > You need to read back the increment and have it be 'correct' > at > >> > > > increment > >> > > > > > time? > >> > > > > > > >> > > > > > (This is a good one) > >> > > > > > > >> > > > > > Thank you Toshihiro Suzuki > >> > > > > > St.Ack > >> > > > > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <brfrn...@gmail.com> > >> wrote: > >> > > > > > > >> > > > > > > St.Ack, > >> > > > > > > > >> > > > > > > Thank you for your response. > >> > > > > > > > >> > > > > > > Why I make out that "A region lock (not a row lock) seems to > >> > occur > >> > > in > >> > > > > > > waitForPreviousTransactionsComplete()" is as follows: > >> > > > > > > > >> > > > > > > A increment operation has 3 procedures for MVCC. > >> > > > > > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete(); > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712 > >> > > > > > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum); > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721 > >> > > > > > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, walKey); > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893 > >> > > > > > > > >> > > > > > > > >> > > > > > > I think that MultiVersionConsistencyControl's writeQueue can > >> > cause > >> > > a > >> > > > > > region > >> > > > > > > lock. > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43 > >> > > > > > > > >> > > > > > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue. > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108 > >> > > > > > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue. > >> > > > > > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) -> > >> > > > > > > waitForPreviousTransactionsComplete(e) -> advanceMemstore(w) > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127 > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235 > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160 > >> > > > > > > > >> > > > > > > Step 1 adds a WriteEntry w in beginMemstoreInsert() to > >> writeQueue > >> > > and > >> > > > > > waits > >> > > > > > > until writeQueue is empty or writeQueue.getFirst() == w. > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204 > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241 > >> > > > > > > > >> > > > > > > > >> > > > > > > I think when a handler thread is processing between step 2 > and > >> > step > >> > > > 3, > >> > > > > > the > >> > > > > > > other handler threads can wait at step 1 until the thread > >> > completes > >> > > > > step > >> > > > > > 3 > >> > > > > > > This is depicted as follows: > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png > >> > > > > > > > >> > > > > > > > >> > > > > > > Actually, in the thread dump of our region server, many > >> handler > >> > > > threads > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1 > >> > > > > > > (waitForPreviousTransactionsComplete()). > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt > >> > > > > > > > >> > > > > > > Many handler threads wait at this: > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224 > >> > > > > > > > >> > > > > > > > >> > > > > > > > Is it possible you are contending on a counter > post-upgrade? > >> > Is > >> > > it > >> > > > > > > > possible that all these threads are trying to get to the > >> same > >> > row > >> > > > to > >> > > > > > > update > >> > > > > > > > it? Could the app behavior have changed? Or are you > >> thinking > >> > > > > increment > >> > > > > > > > itself has slowed significantly? > >> > > > > > > We have just upgraded HBase, not changed the app behavior. > We > >> are > >> > > > > > thinking > >> > > > > > > increment itself has slowed significantly. > >> > > > > > > Before upgrading HBase, it was good throughput and latency. > >> > > > > > > Currently, to cope with this problem, we split the regions > >> > finely. > >> > > > > > > > >> > > > > > > Thanks, > >> > > > > > > > >> > > > > > > Toshihiro Suzuki > >> > > > > > > > >> > > > > > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net>: > >> > > > > > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <brfrn...@gmail.com > > > >> > > wrote: > >> > > > > > > > > >> > > > > > > > > Ted, > >> > > > > > > > > > >> > > > > > > > > Thank you for your response. > >> > > > > > > > > > >> > > > > > > > > I uploaded the complete stack trace to Gist. > >> > > > > > > > > > >> > > > > > > > > https://gist.github.com/brfrn169/cb4f2c157129330cd932 > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > I think that increment operation works as follows: > >> > > > > > > > > > >> > > > > > > > > 1. get row lock > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() // wait > for > >> all > >> > > > prior > >> > > > > > > MVCC > >> > > > > > > > > transactions to finish > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // start a > >> > transaction > >> > > > > > > > > 4. get previous values > >> > > > > > > > > 5. create KVs > >> > > > > > > > > 6. write to Memstore > >> > > > > > > > > 7. write to WAL > >> > > > > > > > > 8. release row lock > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // complete > the > >> > > > > > transaction > >> > > > > > > > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl has a > pending > >> > > queue > >> > > > of > >> > > > > > > > writes > >> > > > > > > > > named writeQueue. > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and waits until > >> > > > writeQueue > >> > > > > > is > >> > > > > > > > > empty or writeQueue.getFirst() == w. > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and step 9 > removes > >> the > >> > > > > > > WriteEntry > >> > > > > > > > > from writeQueue. > >> > > > > > > > > > >> > > > > > > > > I think that when a handler thread is processing between > >> > step 2 > >> > > > and > >> > > > > > > step > >> > > > > > > > 9, > >> > > > > > > > > the other handler threads can wait until the thread > >> completes > >> > > > step > >> > > > > 9. > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > That is right. We need to read, after all outstanding > >> updates > >> > are > >> > > > > > done... > >> > > > > > > > because we need to read the latest update before we go to > >> > > > > > > modify/increment > >> > > > > > > > it. > >> > > > > > > > > >> > > > > > > > How do you make out this? > >> > > > > > > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in > >> > > > > > > > waitForPreviousTransactionsComplete()." > >> > > > > > > > > >> > > > > > > > In 0.98.x we did this: > >> > > > > > > > > >> > > > > > > > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); > >> > > > > > > > > >> > > > > > > > ... and in 1.0 we do this: > >> > > > > > > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which is > this.... > >> > > > > > > > > >> > > > > > > > + public void waitForPreviousTransactionsComplete() { > >> > > > > > > > + WriteEntry w = beginMemstoreInsert(); > >> > > > > > > > + waitForPreviousTransactionsComplete(w); > >> > > > > > > > + } > >> > > > > > > > > >> > > > > > > > The mvcc and region sequenceid were merged in 1.0 ( > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763). > Previous > >> > mvcc > >> > > > and > >> > > > > > > > region > >> > > > > > > > sequenceid would spin independent of each other. Perhaps > >> this > >> > > > > > responsible > >> > > > > > > > for some slow down. > >> > > > > > > > > >> > > > > > > > That said, looking in your thread dump, we seem to be down > >> in > >> > the > >> > > > > Get. > >> > > > > > If > >> > > > > > > > you do a bunch of thread dumps in a row, where is the > >> > > lock-holding > >> > > > > > > thread? > >> > > > > > > > In Get or writing Increment... or waiting on sequence id? > >> > > > > > > > > >> > > > > > > > Is it possible you are contending on a counter > post-upgrade? > >> > Is > >> > > it > >> > > > > > > > possible that all these threads are trying to get to the > >> same > >> > row > >> > > > to > >> > > > > > > update > >> > > > > > > > it? Could the app behavior have changed? Or are you > >> thinking > >> > > > > increment > >> > > > > > > > itself has slowed significantly? > >> > > > > > > > > >> > > > > > > > St.Ack > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > Thanks, > >> > > > > > > > > > >> > > > > > > > > Toshihiro Suzuki > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <yuzhih...@gmail.com>: > >> > > > > > > > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not region): > >> > > > > > > > > > > >> > > > > > > > > > try { > >> > > > > > > > > > rowLock = getRowLock(row); > >> > > > > > > > > > > >> > > > > > > > > > Can you pastebin the complete stack trace ? > >> > > > > > > > > > > >> > > > > > > > > > Thanks > >> > > > > > > > > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 < > >> brfrn...@gmail.com> > >> > > > wrote: > >> > > > > > > > > > > >> > > > > > > > > > > Hi, > >> > > > > > > > > > > > >> > > > > > > > > > > We upgraded our cluster from CDH5.3.1(HBase0.98.6) > to > >> > > > > > > > > > CDH5.4.5(HBase1.0.0) > >> > > > > > > > > > > and we experience slowdown in increment operation. > >> > > > > > > > > > > > >> > > > > > > > > > > Here's an extract from thread dump of the > >> RegionServer of > >> > > our > >> > > > > > > > cluster: > >> > > > > > > > > > > > >> > > > > > > > > > > Thread 68 > >> > > > > > > (RW.default.writeRpcServer.handler=15,queue=5,port=60020): > >> > > > > > > > > > > State: BLOCKED > >> > > > > > > > > > > Blocked count: 21689888 > >> > > > > > > > > > > Waited count: 39828360 > >> > > > > > > > > > > Blocked on java.util.LinkedList@3474e4b2 > >> > > > > > > > > > > Blocked by 63 > >> > > > > > > > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020) > >> > > > > > > > > > > Stack: > >> > > > > > > > > > > java.lang.Object.wait(Native Method) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > >> > > > >> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451) > >> > > > > > > > > > > > >> > > > > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035) > >> > > > > > > > > > > > >> > > > > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > >> > > > > > > > > > > > >> > > > > > > > > >> > > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > >> > > > > > > > > > > java.lang.Thread.run(Thread.java:745) > >> > > > > > > > > > > > >> > > > > > > > > > > There are many similar threads in the thread dump. > >> > > > > > > > > > > > >> > > > > > > > > > > I read the source code and I think this is caused by > >> > > changes > >> > > > of > >> > > > > > > > > > > MultiVersionConsistencyControl. > >> > > > > > > > > > > A region lock (not a row lock) seems to occur in > >> > > > > > > > > > > waitForPreviousTransactionsComplete(). > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > Also we wrote performance test code for increment > >> > operation > >> > > > > that > >> > > > > > > > > included > >> > > > > > > > > > > 100 threads and ran it in local mode. > >> > > > > > > > > > > > >> > > > > > > > > > > The result is shown below: > >> > > > > > > > > > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6) > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms): > >> 7.975072509210629 > >> > > > > > > > > > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0) > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms): > 49.11840157868772 > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > Thanks, > >> > > > > > > > > > > > >> > > > > > > > > > > Toshihiro Suzuki > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >