>>- There is a spike in compaction time avg time metric. At the same time the >>swap bytes in and swap bytes out also have higher value.
Swapping is bad. You have to avoid it. -Vlad On Sun, Nov 1, 2015 at 10:24 AM, Girish Joshi <gjo...@groupon.com.invalid> wrote: > Hello > > In my hbase cluster, I observe the following consistently happening over > several days:- > > - There is a spike in compaction time avg time metric. At the same time the > swap bytes in and swap bytes out also have higher value. > - Around the same time, I see the FS PRead and FS Read latencies and client > latencies doing random reads increase. > > My hbase cluster consisting of 16 nodes and setup with a replication to > another cluster of 16 nodes has the following workload:- > > - There are around 4 tables which have lot of write activity(around 500k > per second writes on m1/m15 moving average). 2 of these tables have atomic > counter columns keeping track of some analytics data and being incremented > with every write. > > - There are 2 tables which receive bulk uploaded data periodically(around > once a day) > > - We expect reads at around 100k per second mainly from tables which have > bulk upload data and the one which has counter columns. The read > latencies(p99) spike up to around 1000-5000 ms when the above compaction > time avg time metric increases. In other times, they are below 100 ms. > > I have set the hbase.hregion.majorcompaction to 0 on region servers; I plan > to set it to 0 on master nodes too so that I can take out the possibility > of time triggered major compactions being the problem. But I suspect there > are lot of minor compactions and those leading to major compactions > happening at the time of spikes. > > *Any suggestions on how to avoid this situation of read latency spikes and > have better read performance?* > > Thanks, > > Girish. >