Hello,
  When a region server is under stress (hotspotting, or large replication,
call queue sizes hitting the limit, other processes competing with HBase
etc), we experience latency spikes for all regions hosted by that region
server.  This is somewhat expected in the plain HBase world.

However, with a phoenix global index, this service deterioration seems to
propagate to a lot more region servers, since the affected RS hosts some
index regions. The actual data regions are on another RS and latencies on
that RS spike because it cannot complete the index update calls quickly.
And that second RS now causes issues on yet another one and so on.

We've seen this happen on our cluster, and how we deal with this is by
"fixing" the original RS - split regions/restart/move around regions,
depending on what the problem is.

Has any one experienced this issue? It feels like antithetical behavior for
a distributed system. Cluster breaking down for the the very reasons its
supposed to protect against.

Love to hear the thoughts of Phoenix community on this

Reply via email to