Hello, I'm using Ignite 2.0.0, and I would like to ask if you have any doubts about the deadlock. The first use pattern is to create a new cache time unit, and after a certain period of time, it will perform Destroy.
Example) We create a cache that keeps the data of the 3-minute cycle as shown below [00:00_Cache] [00:01_Cache] [00:02_Cache] After one minute, create a new cache [00: 03_Cache] and clear old cache [00: 00_Cache]. [00:00_Cache] is destroy! [00:03_Cache] is new! below current cache list [00:01_Cache] [00:02_Cache] [00:03_Cache] The reason for using this is to remove the data of a certain time period quickly rather than the expiry of Cache. As a result of eye observation, it was possible to quickly remove data in the time zone without using a lot of CPU. In this state, I kept it for about 5 hours, and then I took down 5 Client nodes that existed in Topology for a while and then uploaded them again. Then, about ten minutes later, a deadlock occurred with the following message. [19:48:51,290][WARN ][grid-timeout-worker-#15%null%][G] >>> Possible starvation in striped pool. Thread name: sys-stripe-3-#4%null% Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicUpdateRequest [keys=[KeyCacheObjectImpl [part=179, val Deadlock: true Completed: 1054320 Thread [name="sys-stripe-3-#4%null%", id=21, state=BLOCKED, blockCnt=5364, waitCnt=1261740] Lock [object=o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCacheEntry@6c7a9d31, ownerName=sys-stripe-6-#7%null%, ownerId=24] at o.a.i.i.processors.cache.GridCacheMapEntry.markObsoleteIfEmpty(GridCacheMapEntry.java:2095) at o.a.i.i.processors.cache.CacheOffheapEvictionManager.touch(CacheOffheapEvictionManager.java:44) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.unlockEntries(GridDhtAtomicCache.java:2896) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1853) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3016) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:127) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$6.apply(GridDhtAtomicCache.java:282) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$6.apply(GridDhtAtomicCache.java:277) at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:863) at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:386) at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:308) at o.a.i.i.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:100) at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:253) at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1257) at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:885) at o.a.i.i.managers.communication.GridIoManager.access$2100(GridIoManager.java:114) at o.a.i.i.managers.communication.GridIoManager$7.run(GridIoManager.java:802) at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483) at java.lang.Thread.run(Thread.java:745) [19:48:51,423][WARN ][grid-timeout-worker-#15%null%][G] >>> Possible starvation in striped pool. Thread name: sys-stripe-5-#6%null% Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicUpdateRequest [keys=[KeyCacheObjectImpl [part=541, val Deadlock: true Completed: 932925 Thread [name="sys-stripe-5-#6%null%", id=23, state=BLOCKED, blockCnt=5629, waitCnt=1137576] Lock [object=o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCacheEntry@449f1914, ownerName=sys-stripe-6-#7%null%, ownerId=24] at sun.misc.Unsafe.monitorEnter(Native Method) at o.a.i.i.util.GridUnsafe.monitorEnter(GridUnsafe.java:1193) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.lockEntries(GridDhtAtomicCache.java:2815) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1741) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3016) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:127) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$6.apply(GridDhtAtomicCache.java:282) at o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$6.apply(GridDhtAtomicCache.java:277) at o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:863) at o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:386) at o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:308) at o.a.i.i.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:100) at o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:253) at o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1257) at o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:885) at o.a.i.i.managers.communication.GridIoManager.access$2100(GridIoManager.java:114) at o.a.i.i.managers.communication.GridIoManager$7.run(GridIoManager.java:802) at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483) at java.lang.Thread.run(Thread.java:745) Deadlock jmc picture <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/ignite-deadlock-1.ignite-deadlock-1> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/ignite-deadlock-2.ignite-deadlock-2> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/ignite-deadlock-3.png> As you can see in the picture above, we can see that sys-stripe-5 and sys-stripe-6 are the owner of the thread. Besides Ignite Cache Configuration is shown below. return ignite.getOrCreateCache(new CacheConfiguration<String, RollupMetric>() .setName(cacheName) .setCacheMode(CacheMode.PARTITIONED) .setAtomicityMode(CacheAtomicityMode.ATOMIC) .setRebalanceMode(CacheRebalanceMode.ASYNC) .setMemoryPolicyName(MEMORY_POLICY_NAME) .setBackups(1) .setStatisticsEnabled(true) .setManagementEnabled(true) .setCopyOnRead(false) .setQueryParallelism(20) .setLongQueryWarningTimeout(10000) // 10s .setEagerTtl(false) .setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(new Duration(TimeUnit.DAYS, 365))) .setMaxConcurrentAsyncOperations(CacheConfiguration.DFLT_MAX_CONCURRENT_ASYNC_OPS * 10) .setAffinity(new CoupangAffinityFunction()) .setIndexedTypes(String.class, RollupMetric.class)); The reason for setting the CacheExpiryPolicy to 1 year above is because the entry is evicted by clearing the cache as described previously. Ignite Memory Configuration <property name="memoryConfiguration"> <bean class="org.apache.ignite.configuration.MemoryConfiguration"> <property name="memoryPolicies"> <list> <bean class="org.apache.ignite.configuration.MemoryPolicyConfiguration"> <property name="name" value="RollupMemory"/> <property name="pageEvictionMode" value="RANDOM_LRU"/> <property name="metricsEnabled" value="true"/> <property name="initialSize" value="21474836480"/> <property name="maxSize" value="21474836480"/> </bean> </list> </property> <property name="pageSize" value="4096"/> <property name="concurrencyLevel" value="8"/> </bean> </property> For what reason did Deadlock occur? Is there an option or usage pattern to solve this? I think it is due to the client's topology changes. If so, how would you handle it? Please let me know if you have any additional questions. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/