Hi,

I guess that you should provide the full client and server logs, configuration files and reproducer if it's possible for case when the client node with near cache was able to crush the whole cluster.

Looks like it can be the issue here and the best way will be raise the JIRA ticket for it after analyze of provided data.

BR,
Andrei

On 2019/07/31 14:54:42, Matt Nohelty <n...@gmail.com> wrote:
> Sorry for the long delay in responding to this issue. I will work on>
> replicating this issue in a more controlled test environment and try to>
> grab thread dumps from there.>
>
> In a previous post you mentioned that the blocking in this thread dump>
> should only happen when a data node is affected which is usually a server>
> node and you also said that near cache consistency is observed>
> continuously. If we have near caching enabled, does that mean clients>
> become data nodes? If that's the case, does that explain why we are seeing>
> blocking when a client crashes or hangs?>
>
> Assuming this is related to near caching, is there any configuration to>
> adjust this behavior to give us availability over perfect consistency?>
> Having a failure on one client ripple across the entire system and>
> effectively take down all other clients of that cluster is a major problem.> > We obviously want to avoid problems like an OOM error or a big GC pause in>
> the client application but if these things happen we need to be able to>
> absorb these gracefully and limit the blast radius to just that client>
> node.>
>

Reply via email to