Hello, When a region server is under stress (hotspotting, or large replication, call queue sizes hitting the limit, other processes competing with HBase etc), we experience latency spikes for all regions hosted by that region server. This is somewhat expected in the plain HBase world.
However, with a phoenix global index, this service deterioration seems to propagate to a lot more region servers, since the affected RS hosts some index regions. The actual data regions are on another RS and latencies on that RS spike because it cannot complete the index update calls quickly. And that second RS now causes issues on yet another one and so on. We've seen this happen on our cluster, and how we deal with this is by "fixing" the original RS - split regions/restart/move around regions, depending on what the problem is. Has any one experienced this issue? It feels like antithetical behavior for a distributed system. Cluster breaking down for the the very reasons its supposed to protect against. Love to hear the thoughts of Phoenix community on this