Sorry for the long delay in responding to this issue.  I will work on
replicating this issue in a more controlled test environment and try to
grab thread dumps from there.

In a previous post you mentioned that the blocking in this thread dump
should only happen when a data node is affected which is usually a server
node and you also said that near cache consistency is observed
continuously.  If we have near caching enabled, does that mean clients
become data nodes?  If that's the case, does that explain why we are seeing
blocking when a client crashes or hangs?

Assuming this is related to near caching, is there any configuration to
adjust this behavior to give us availability over perfect consistency?
Having a failure on one client ripple across the entire system and
effectively take down all other clients of that cluster is a major problem.
We obviously want to avoid problems like an OOM error or a big GC pause in
the client application but if these things happen we need to be able to
absorb these gracefully and limit the blast radius to just that client
node.

Reply via email to