On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
bbeaudrea...@hubspot.com> wrote:
> Should this be added as a known issue in the CDH or hbase documentation? It
> was a severe performance hit for us, all of our regionservers were sitting
> at a few thousand queued requests.
>
>
Let me take
Is there any update to this? We just upgraded all of our production
clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA listed in the
known issues, did not not about this. Now we are seeing perfomance issues
across all clusters, as we make heavy use of increments.
Can we roll forward to
Rollback is untested. No fix in 5.5. I was going to work on this now. Where
are your counters Bryan? In their own column family or scattered about in a
row with other Cell types?
St.Ack
On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
bbeaudrea...@hubspot.com> wrote:
> Is there any update
Yea, they are all over the place and called from client and coprocessor
code. We ended up having no other option but to rollback, and aside from a
few NoSuchMethodErrors due to API changes (Put#add vs Put#addColumn), it
seems to be working and fixing our problem.
On Mon, Nov 30, 2015 at 3:47 PM
Should this be added as a known issue in the CDH or hbase documentation? It
was a severe performance hit for us, all of our regionservers were sitting
at a few thousand queued requests.
On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault
wrote:
> Yea, they are all over
Sorry the second link should be
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault
wrote:
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
>
> An active handler:
>
Those log lines have settled down, they may have been related to a
cluster-wide forced restart at the time.
On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault
wrote:
> We've been doing more debugging of this and have set up the read vs write
> handlers to try to at
We've been doing more debugging of this and have set up the read vs write
handlers to try to at least segment this away so reads can work. We have
pretty beefy servers, and are running wiht the following settings:
hbase.regionserver.handler.count=1000
hbase.ipc.server.read.threadpool.size=50
On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault wrote:
> The rollback seems to have mostly solved the issue for one of our clusters,
> but another one is still seeing long increment times:
>
> "slowIncrementCount": 52080,
> "Increment_num_ops":
The rollback seems to have mostly solved the issue for one of our clusters,
but another one is still seeing long increment times:
"slowIncrementCount": 52080,
"Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162,"
Increment_mean": 465.68678129112396,"Increment_median": 216,"
https://gist.github.com/bbeaudreault/2994a748da83d9f75085
An active handler:
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
One that is locked:
https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
The difference between
Still slow increments though?
On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault wrote:
> Those log lines have settled down, they may have been related to a
> cluster-wide forced restart at the time.
>
> On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
>
On Mon, Nov 30, 2015 at 9:16 PM, Bryan Beaudreault wrote:
> I'll try to get another one. We are currently not seeing the issue due to
> lack of contention (it is off hours for our customers).
>
> Note that the stack trace I gave you was taken with a tool we have which
Yea sorry if I was misleading. The nonce loglines we saw only happened on
full cluster restart, it may have been the HLog's replaying, not sure.
We are still seeing slow Increments. Where Gets and Mutates will be on the
order of 50-150ms according to metrics, Increment will be in the
1000-5000ms
Looking again, the
https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
thread
dump and the https://gist.github.com/bbeaudreault/2994a748da83d9f75085
thread dump are the same? Only have two increments going on in this thread
dump:
at
I'll try to get another one. We are currently not seeing the issue due to
lack of contention (it is off hours for our customers).
Note that the stack trace I gave you was taken with a tool we have which
aggregates common stacks. The one at the bottom occurred 122 times (out of
128 handlers --
Looking at that stack trace, nothing showing as blocked or slowed by
another operation. You have others I could look at Bryan?
St.Ack
On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault wrote:
> Yea sorry if I was misleading. The nonce loglines we saw only happened on
I didn't think to use the non-aggregated jstack outout as it has become
second nature for us to use https://github.com/HubSpot/astack/.
It rolls up repeating stacktraces. You can see above each stacktrace the
number of times it occurred and an estimated cpu time spent. Sorry will
try to get it
18 matches
Mail list logo