Bryan,
Thanks again for the incredibly useful reply.
I have confirmed that the callQueueLen is in fact 0, with a max value of 2 in
the last week (in ganglia)
hbase.hstore.compaction.max was set to 15 on the nodes, from a previous 7.
Freezes (laggy responses) on the cluster are frequent and affect both reads and
writes. I noticed iowait on the nodes that spikes.
The cluster goes between a state of working 100% to nothing serving/timeouts
for no discernible reason.
Looking through the logs I have tons of responseTooSlow, this is the only
regular occurrence in the logs:
hbase-hadoop-regionserver-ip-10-230-130-121.us-west-2.compute.internal.log:2014-11-06
03:54:31,640 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler 39 on
60020): (responseTooSlow):
{"processingtimems":14573,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@c67b2ac),
rpc version=1, client version=29,
methodsFingerPrint=-540141542","client":"10.231.139.198:57223","starttimems":1415246057066,"queuetimems":20640,"class":"HRegionServer","responsesize":0,"method":"multi"}
hbase-hadoop-regionserver-ip-10-230-130-121.us-west-2.compute.internal.log:2014-11-06
03:54:31,640 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler 42 on
60020): (responseTooSlow):
{"processingtimems":45660,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6c034090),
rpc version=1, client version=29,
methodsFingerPrint=-540141542","client":"10.231.21.106:41126","starttimems":1415246025979,"queuetimems":202,"class":"HRegionServer","responsesize":0,"method":"multi"}
hbase-hadoop-regionserver-ip-10-230-130-121.us-west-2.compute.internal.log:2014-11-06
03:54:31,642 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler 46 on
60020): (responseTooSlow):
{"processingtimems":14620,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@4fc3bb1f),
rpc version=1, client version=29,
methodsFingerPrint=-540141542","client":"10.230.130.102:54068","starttimems":1415246057021,"queuetimems":27565,"class":"HRegionServer","responsesize":0,"method":"multi"}
hbase-hadoop-regionserver-ip-10-230-130-121.us-west-2.compute.internal.log:2014-11-06
03:54:31,642 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler 35 on
60020): (responseTooSlow):
{"processingtimems":13431,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3b321922),
rpc version=1, client version=29,
methodsFingerPrint=-540141542","client":"10.227.42.252:60493","starttimems":1415246058210,"queuetimems":1134,"class":"HRegionServer","responsesize":0,"method":"multi"}
On Nov 6, 2014, at 12:38 PM, Bryan Beaudreault <[email protected]> wrote:
> blockingStoreFiles