I think I'm talking to myself and giving an answer. lol Maybe this is the issue.
I am doing putAllAsync based on HashMap structure. I hope that it will be fine to replace this part with TreeMap. Reference: http://apache-ignite-users.70518.x6.nabble.com/putAll-stoping-at-600-entries-td817.html I will try and attach a thread again when problems occur. Thanks a lot. 2017-10-21 22:05 GMT+09:00 김성진 <[email protected]>: > A similar issue has re-emerged. When I looked at Stackoverflow, there was > a user similar to me. https://stackoverflow.com/ > questions/45028962/possible-starvation-in-striped-pool- > with-deadlock-true-apache-ignite > > To summarize, I am sending a random value of a pattern like > Timestamp_a.b.c to the key of Map at putAllAsync, about 500 times at a > time. Do you have to send this part after sorting with key value? > > 2017-10-21 21:57 GMT+09:00 김성진 <[email protected]>: > >> Additionally, Client use the cache.putAllAsync () call. >> >> If you look at the Ignite log, you see a method call like >> updateAllAsyncInternal0. >> At the same time, does the client have a lock issue when it >> asynchronously calls after sending a cache entry? :( >> >> 2017-10-21 21:06 GMT+09:00 dark <[email protected]>: >> >>> Hello, I'm using Ignite 2.0.0, and I would like to ask if you have any >>> doubts >>> about the deadlock. >>> The first use pattern is to create a new cache time unit, and after a >>> certain period of time, it will perform Destroy. >>> >>> Example) >>> >>> We create a cache that keeps the data of the 3-minute cycle as shown >>> below >>> >>> [00:00_Cache] [00:01_Cache] [00:02_Cache] >>> >>> After one minute, create a new cache [00: 03_Cache] and clear old cache >>> [00: >>> 00_Cache]. >>> >>> [00:00_Cache] is destroy! >>> [00:03_Cache] is new! >>> >>> below current cache list >>> [00:01_Cache] [00:02_Cache] [00:03_Cache] >>> >>> The reason for using this is to remove the data of a certain time period >>> quickly rather than the expiry of Cache. As a result of eye observation, >>> it >>> was possible to quickly remove data in the time zone without using a lot >>> of >>> CPU. >>> In this state, I kept it for about 5 hours, and then I took down 5 Client >>> nodes that existed in Topology for a while and then uploaded them again. >>> Then, about ten minutes later, a deadlock occurred with the following >>> message. >>> >>> [19:48:51,290][WARN ][grid-timeout-worker-#15%null%][G] >>> Possible >>> starvation in striped pool. >>> Thread name: sys-stripe-3-#4%null% >>> Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, >>> topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, >>> msg=GridDhtAtomicUpdateRequest [keys=[KeyCacheObjectImpl [part=179, val >>> Deadlock: true >>> Completed: 1054320 >>> Thread [name="sys-stripe-3-#4%null%", id=21, state=BLOCKED, >>> blockCnt=5364, >>> waitCnt=1261740] >>> Lock >>> [object=o.a.i.i.processors.cache.distributed.dht.atomic.Grid >>> DhtAtomicCacheEntry@6c7a9d31, >>> ownerName=sys-stripe-6-#7%null%, ownerId=24] >>> at >>> o.a.i.i.processors.cache.GridCacheMapEntry.markObsoleteIfEmp >>> ty(GridCacheMapEntry.java:2095) >>> at >>> o.a.i.i.processors.cache.CacheOffheapEvictionManager.touch(C >>> acheOffheapEvictionManager.java:44) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache.unlockEntries(GridDhtAtomicCache.java:2896) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1853) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3016) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache.access$400(GridDhtAtomicCache.java:127) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache$6.apply(GridDhtAtomicCache.java:282) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache$6.apply(GridDhtAtomicCache.java:277) >>> at >>> o.a.i.i.processors.cache.GridCacheIoManager.processMessage(G >>> ridCacheIoManager.java:863) >>> at >>> o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridC >>> acheIoManager.java:386) >>> at >>> o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(Gr >>> idCacheIoManager.java:308) >>> at >>> o.a.i.i.processors.cache.GridCacheIoManager.access$000(GridC >>> acheIoManager.java:100) >>> at >>> o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(Grid >>> CacheIoManager.java:253) >>> at >>> o.a.i.i.managers.communication.GridIoManager.invokeListener( >>> GridIoManager.java:1257) >>> at >>> o.a.i.i.managers.communication.GridIoManager.processRegularM >>> essage0(GridIoManager.java:885) >>> at >>> o.a.i.i.managers.communication.GridIoManager.access$2100(Gri >>> dIoManager.java:114) >>> at >>> o.a.i.i.managers.communication.GridIoManager$7.run(GridIoMan >>> ager.java:802) >>> at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java >>> :483) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> [19:48:51,423][WARN ][grid-timeout-worker-#15%null%][G] >>> Possible >>> starvation in striped pool. >>> Thread name: sys-stripe-5-#6%null% >>> Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, >>> topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, >>> msg=GridDhtAtomicUpdateRequest [keys=[KeyCacheObjectImpl [part=541, val >>> Deadlock: true >>> Completed: 932925 >>> Thread [name="sys-stripe-5-#6%null%", id=23, state=BLOCKED, >>> blockCnt=5629, >>> waitCnt=1137576] >>> Lock >>> [object=o.a.i.i.processors.cache.distributed.dht.atomic.Grid >>> DhtAtomicCacheEntry@449f1914, >>> ownerName=sys-stripe-6-#7%null%, ownerId=24] >>> at sun.misc.Unsafe.monitorEnter(Native Method) >>> at o.a.i.i.util.GridUnsafe.monitorEnter(GridUnsafe.java:1193) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache.lockEntries(GridDhtAtomicCache.java:2815) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1741) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3016) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache.access$400(GridDhtAtomicCache.java:127) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache$6.apply(GridDhtAtomicCache.java:282) >>> at >>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi >>> cCache$6.apply(GridDhtAtomicCache.java:277) >>> at >>> o.a.i.i.processors.cache.GridCacheIoManager.processMessage(G >>> ridCacheIoManager.java:863) >>> at >>> o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridC >>> acheIoManager.java:386) >>> at >>> o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(Gr >>> idCacheIoManager.java:308) >>> at >>> o.a.i.i.processors.cache.GridCacheIoManager.access$000(GridC >>> acheIoManager.java:100) >>> at >>> o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(Grid >>> CacheIoManager.java:253) >>> at >>> o.a.i.i.managers.communication.GridIoManager.invokeListener( >>> GridIoManager.java:1257) >>> at >>> o.a.i.i.managers.communication.GridIoManager.processRegularM >>> essage0(GridIoManager.java:885) >>> at >>> o.a.i.i.managers.communication.GridIoManager.access$2100(Gri >>> dIoManager.java:114) >>> at >>> o.a.i.i.managers.communication.GridIoManager$7.run(GridIoMan >>> ager.java:802) >>> at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java >>> :483) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Deadlock jmc picture >>> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/i >>> gnite-deadlock-1.ignite-deadlock-1> >>> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/i >>> gnite-deadlock-2.ignite-deadlock-2> >>> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/i >>> gnite-deadlock-3.png> >>> >>> As you can see in the picture above, we can see that sys-stripe-5 and >>> sys-stripe-6 are the owner of the thread. Besides Ignite Cache >>> Configuration >>> is shown below. >>> >>> return ignite.getOrCreateCache(new CacheConfiguration<String, >>> RollupMetric>() >>> .setName(cacheName) >>> .setCacheMode(CacheMode.PARTITIONED) >>> .setAtomicityMode(CacheAtomicityMode.ATOMIC) >>> .setRebalanceMode(CacheRebalanceMode.ASYNC) >>> .setMemoryPolicyName(MEMORY_POLICY_NAME) >>> .setBackups(1) >>> .setStatisticsEnabled(true) >>> .setManagementEnabled(true) >>> .setCopyOnRead(false) >>> .setQueryParallelism(20) >>> .setLongQueryWarningTimeout(10000) // 10s >>> .setEagerTtl(false) >>> .setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(new >>> Duration(TimeUnit.DAYS, 365))) >>> >>> .setMaxConcurrentAsyncOperations(CacheConfiguration.DFLT_MAX >>> _CONCURRENT_ASYNC_OPS >>> * 10) >>> .setAffinity(new CoupangAffinityFunction()) >>> .setIndexedTypes(String.class, RollupMetric.class)); >>> >>> The reason for setting the CacheExpiryPolicy to 1 year above is because >>> the >>> entry is evicted by clearing the cache as described previously. >>> >>> Ignite Memory Configuration >>> <property name="memoryConfiguration"> >>> <bean class="org.apache.ignite.configuration.MemoryConfiguration"> >>> >>> <property name="memoryPolicies"> >>> <list> >>> <bean >>> class="org.apache.ignite.configuration.MemoryPolicyConfiguration"> >>> <property name="name" value="RollupMemory"/> >>> >>> <property name="pageEvictionMode" value="RANDOM_LRU"/> >>> <property name="metricsEnabled" value="true"/> >>> >>> <property name="initialSize" value="21474836480"/> >>> >>> <property name="maxSize" value="21474836480"/> >>> </bean> >>> </list> >>> </property> >>> <property name="pageSize" value="4096"/> >>> <property name="concurrencyLevel" value="8"/> >>> </bean> >>> </property> >>> >>> For what reason did Deadlock occur? Is there an option or usage pattern >>> to >>> solve this? >>> >>> I think it is due to the client's topology changes. If so, how would you >>> handle it? >>> >>> Please let me know if you have any additional questions. >>> >>> >>> >>> >>> -- >>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >>> >> >> >
