Hello! Unfortunately it's hard to say what happens here, because the oldest log already starts with error messages and any root causes are clobbered.
Do you have persistence in this cluster? If the answer is yes, are you sure that OS is not swapping out Ignite memory to disk, and that your disk is not overwhelmed with updates. If no, please make sure you're not running out of data region. Regards, -- Ilya Kasnacheev ср, 9 окт. 2019 г. в 13:50, ihalilaltun <ibrahim.al...@segmentify.com>: > Hi There, > > We had a unresponsive cluster today after the following error; > > > [2019-10-09T07:08:13,623][ERROR][sys-stripe-94-#95][GridCacheDatabaseSharedManager] > Checkpoint read lock acquisition has been timed out. > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$CheckpointReadLockTimeoutException: > Checkpoint read lock acquisition has been timed out. > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.failCheckpointReadLock(GridCacheDatabaseSharedManager.java:1564) > ~[ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1497) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1739) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3138) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:135) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:271) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:266) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093) > [ignite-core-2.7.6.jar:2.7.6] > at > > org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505) > [ignite-core-2.7.6.jar:2.7.6] > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > [ignite-core-2.7.6.jar:2.7.6] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201] > > > > After this log cluster went into infinite loop somehow and became > unresponsive. Since log files are bigger than 5MB, I am sharing > google-drive > link for all log files. > > https://drive.google.com/drive/folders/1XHaw2YZq3_F4CMw8m_mJZkUz1K17njU9?usp=sharing > > any help appriciated > > thanks > > > > > ----- > İbrahim Halil Altun > Senior Software Engineer @ Segmentify > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >