By the description, it seems compaction of a specific subset of regions is taking a long time to complete, filling all the compaction threads whilst all other compaction requests are queued waiting for these long ones to complete. Once these long compassions are finished, the queued ones are processed very quickly. I'm not sure about hbase 1.1.3 (this is quite old, btw), but newer versions do log the total time and size of compacted data at the end of each compaction. The only other way compaction queues would be cleaned out is if RSes got restarted.
Em dom., 19 de mai. de 2024 às 22:30, Rural Hunter <[email protected]> escreveu: > Hi, > > We are experiencing periodic slow response issue. We investigated the issue > and found it's related to hfile compaction. The slow down happens when > there are many compaction activities in log. So we tuned some compaction > parameters and also started to monitor the metric: compactionQueueLength. > When the slow response happens, we can see the compactionQueueLength keeps > increasing. In the log there is one item of major compaction completion > every several minutes. One interesting finding is that > compactionQueueLength > keeps increasing to more than 1000 or even 3000 on some servers until at > some point it drops to 0 suddenly, like it is it cleared by someone. There > is nothing special in the log at the time and after that there is not much > compaction activity. I searched the doc and web but couldn't find any > explanation for that. Can anyone explain what happened? Thanks in advance. > btw, our hbase version is 1.1.3 >
