I already posted this question to stack overflow here
https://stackoverflow.com/questions/55801760/what-happens-in-apache-ignite-when-a-client-gets-disconnected
but this mailing list is probably more appropriate.

We use Apache Ignite for caching and are seeing some unexpected behavior
across all of the clients of cluster when one of the clients fails. The
Ignite cluster itself has three servers and there are approximately 12
servers connecting to that cluster as clients. The cluster has persistence
disabled and many of the caches have near caching enabled.

What we are seeing is that when one of the clients fail (out of memory,
high CPU, network connectivity, etc.), threads on all the other clients
block for a period of time. During these times, the Ignite servers
themselves seem fine but I see things like the following in the logs:

Topology snapshot [ver=123, servers=3, clients=11, CPUs=XXX,
offheap=XX.XGB, heap=XXX.GB]Topology snapshot [ver=124, servers=3,
clients=10, CPUs=XXX, offheap=XX.XGB, heap=XXX.GB]

The topology itself is clearly changing when a client connects/disconnects
but is there anything happening internally inside the cluster that could
cause blocking on other clients? I would expect re-balancing of data when a
server disconnects but not a client.

>From a thread dump, I see many threads stuck in the following state:

java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)- parking to wait for
<0x000000078a86ff18> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at org.apache.ignite.internal.util.IgniteUtils.await(IgniteUtils.java:7452)
at 
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.awaitAllReplies(GridReduceQueryExecutor.java:1056)
at 
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:733)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1339)
at 
org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$9.iterator(IgniteH2Indexing.java:1403)
at 
org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
at java.lang.Iterable.forEach(Iterable.java:74)...

Any ideas, suggestions, or further avenues to investigate would be much
appreciated.

Reply via email to