Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault < bbeaudrea...@hubspot.com> wrote: > Should this be added as a known issue in the CDH or hbase documentation? It > was a severe performance hit for us, all of our regionservers were sitting > at a few thousand queued requests. > > Let me take

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Is there any update to this? We just upgraded all of our production clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the known issues, did not not about this. Now we are seeing perfomance issues across all clusters, as we make heavy use of increments. Can we roll forward to

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
Rollback is untested. No fix in 5.5. I was going to work on this now. Where are your counters Bryan? In their own column family or scattered about in a row with other Cell types? St.Ack On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault < bbeaudrea...@hubspot.com> wrote: > Is there any update

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Yea, they are all over the place and called from client and coprocessor code. We ended up having no other option but to rollback, and aside from a few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn), it seems to be working and fixing our problem. On Mon, Nov 30, 2015 at 3:47 PM

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Should this be added as a known issue in the CDH or hbase documentation? It was a severe performance hit for us, all of our regionservers were sitting at a few thousand queued requests. On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault wrote: > Yea, they are all over

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Sorry the second link should be https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579 On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault wrote: > https://gist.github.com/bbeaudreault/2994a748da83d9f75085 > > An active handler: >

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Those log lines have settled down, they may have been related to a cluster-wide forced restart at the time. On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault wrote: > We've been doing more debugging of this and have set up the read vs write > handlers to try to at

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
We've been doing more debugging of this and have set up the read vs write handlers to try to at least segment this away so reads can work. We have pretty beefy servers, and are running wiht the following settings: hbase.regionserver.handler.count=1000 hbase.ipc.server.read.threadpool.size=50

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault wrote: > The rollback seems to have mostly solved the issue for one of our clusters, > but another one is still seeing long increment times: > > "slowIncrementCount": 52080, > "Increment_num_ops":

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
The rollback seems to have mostly solved the issue for one of our clusters, but another one is still seeing long increment times: "slowIncrementCount": 52080, "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162," Increment_mean": 465.68678129112396,"Increment_median": 216,"

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
https://gist.github.com/bbeaudreault/2994a748da83d9f75085 An active handler: https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286 One that is locked: https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579 The difference between

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
Still slow increments though? On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault wrote: > Those log lines have settled down, they may have been related to a > cluster-wide forced restart at the time. > > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault < >

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
On Mon, Nov 30, 2015 at 9:16 PM, Bryan Beaudreault wrote: > I'll try to get another one. We are currently not seeing the issue due to > lack of contention (it is off hours for our customers). > > Note that the stack trace I gave you was taken with a tool we have which

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
Yea sorry if I was misleading. The nonce loglines we saw only happened on full cluster restart, it may have been the HLog's replaying, not sure. We are still seeing slow Increments. Where Gets and Mutates will be on the order of 50-150ms according to metrics, Increment will be in the 1000-5000ms

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
Looking again, the https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359 thread dump and the https://gist.github.com/bbeaudreault/2994a748da83d9f75085 thread dump are the same? Only have two increments going on in this thread dump: at

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
I'll try to get another one. We are currently not seeing the issue due to lack of contention (it is off hours for our customers). Note that the stack trace I gave you was taken with a tool we have which aggregates common stacks. The one at the bottom occurred 122 times (out of 128 handlers --

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Stack
Looking at that stack trace, nothing showing as blocked or slowed by another operation. You have others I could look at Bryan? St.Ack On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault wrote: > Yea sorry if I was misleading. The nonce loglines we saw only happened on

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

2015-11-30 Thread Bryan Beaudreault
I didn't think to use the non-aggregated jstack outout as it has become second nature for us to use https://github.com/HubSpot/astack/. It rolls up repeating stacktraces. You can see above each stacktrace the number of times it occurred and an estimated cpu time spent. Sorry will try to get it