Hello, I'm using Ignite 2.0.0, and I would like to ask if you have any doubts
about the deadlock.
The first use pattern is to create a new cache time unit, and after a
certain period of time, it will perform Destroy.
Example)
We create a cache that keeps the data of the 3-minute cycle as shown below
[00:00_Cache] [00:01_Cache] [00:02_Cache]
After one minute, create a new cache [00: 03_Cache] and clear old cache [00:
00_Cache].
[00:00_Cache] is destroy!
[00:03_Cache] is new!
below current cache list
[00:01_Cache] [00:02_Cache] [00:03_Cache]
The reason for using this is to remove the data of a certain time period
quickly rather than the expiry of Cache. As a result of eye observation, it
was possible to quickly remove data in the time zone without using a lot of
CPU.
In this state, I kept it for about 5 hours, and then I took down 5 Client
nodes that existed in Topology for a while and then uploaded them again.
Then, about ten minutes later, a deadlock occurred with the following
message.
[19:48:51,290][WARN ][grid-timeout-worker-#15%null%][G] >>> Possible
starvation in striped pool.
Thread name: sys-stripe-3-#4%null%
Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE,
topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
msg=GridDhtAtomicUpdateRequest [keys=[KeyCacheObjectImpl [part=179, val
Deadlock: true
Completed: 1054320
Thread [name="sys-stripe-3-#4%null%", id=21, state=BLOCKED, blockCnt=5364,
waitCnt=1261740]
Lock
[object=o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCacheEntry@6c7a9d31,
ownerName=sys-stripe-6-#7%null%, ownerId=24]
at
o.a.i.i.processors.cache.GridCacheMapEntry.markObsoleteIfEmpty(GridCacheMapEntry.java:2095)
at
o.a.i.i.processors.cache.CacheOffheapEvictionManager.touch(CacheOffheapEvictionManager.java:44)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.unlockEntries(GridDhtAtomicCache.java:2896)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1853)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3016)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:127)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$6.apply(GridDhtAtomicCache.java:282)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$6.apply(GridDhtAtomicCache.java:277)
at
o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:863)
at
o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:386)
at
o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:308)
at
o.a.i.i.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:100)
at
o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:253)
at
o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1257)
at
o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:885)
at
o.a.i.i.managers.communication.GridIoManager.access$2100(GridIoManager.java:114)
at
o.a.i.i.managers.communication.GridIoManager$7.run(GridIoManager.java:802)
at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
at java.lang.Thread.run(Thread.java:745)
[19:48:51,423][WARN ][grid-timeout-worker-#15%null%][G] >>> Possible
starvation in striped pool.
Thread name: sys-stripe-5-#6%null%
Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE,
topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
msg=GridDhtAtomicUpdateRequest [keys=[KeyCacheObjectImpl [part=541, val
Deadlock: true
Completed: 932925
Thread [name="sys-stripe-5-#6%null%", id=23, state=BLOCKED, blockCnt=5629,
waitCnt=1137576]
Lock
[object=o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCacheEntry@449f1914,
ownerName=sys-stripe-6-#7%null%, ownerId=24]
at sun.misc.Unsafe.monitorEnter(Native Method)
at o.a.i.i.util.GridUnsafe.monitorEnter(GridUnsafe.java:1193)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.lockEntries(GridDhtAtomicCache.java:2815)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1741)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3016)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:127)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$6.apply(GridDhtAtomicCache.java:282)
at
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$6.apply(GridDhtAtomicCache.java:277)
at
o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:863)
at
o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:386)
at
o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:308)
at
o.a.i.i.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:100)
at
o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:253)
at
o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1257)
at
o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:885)
at
o.a.i.i.managers.communication.GridIoManager.access$2100(GridIoManager.java:114)
at
o.a.i.i.managers.communication.GridIoManager$7.run(GridIoManager.java:802)
at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java:483)
at java.lang.Thread.run(Thread.java:745)
Deadlock jmc picture
<http://apache-ignite-users.70518.x6.nabble.com/file/t1415/ignite-deadlock-1.ignite-deadlock-1>
<http://apache-ignite-users.70518.x6.nabble.com/file/t1415/ignite-deadlock-2.ignite-deadlock-2>
<http://apache-ignite-users.70518.x6.nabble.com/file/t1415/ignite-deadlock-3.png>
As you can see in the picture above, we can see that sys-stripe-5 and
sys-stripe-6 are the owner of the thread. Besides Ignite Cache Configuration
is shown below.
return ignite.getOrCreateCache(new CacheConfiguration<String,
RollupMetric>()
.setName(cacheName)
.setCacheMode(CacheMode.PARTITIONED)
.setAtomicityMode(CacheAtomicityMode.ATOMIC)
.setRebalanceMode(CacheRebalanceMode.ASYNC)
.setMemoryPolicyName(MEMORY_POLICY_NAME)
.setBackups(1)
.setStatisticsEnabled(true)
.setManagementEnabled(true)
.setCopyOnRead(false)
.setQueryParallelism(20)
.setLongQueryWarningTimeout(10000) // 10s
.setEagerTtl(false)
.setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(new
Duration(TimeUnit.DAYS, 365)))
.setMaxConcurrentAsyncOperations(CacheConfiguration.DFLT_MAX_CONCURRENT_ASYNC_OPS
* 10)
.setAffinity(new CoupangAffinityFunction())
.setIndexedTypes(String.class, RollupMetric.class));
The reason for setting the CacheExpiryPolicy to 1 year above is because the
entry is evicted by clearing the cache as described previously.
Ignite Memory Configuration
<property name="memoryConfiguration">
<bean class="org.apache.ignite.configuration.MemoryConfiguration">
<property name="memoryPolicies">
<list>
<bean
class="org.apache.ignite.configuration.MemoryPolicyConfiguration">
<property name="name" value="RollupMemory"/>
<property name="pageEvictionMode" value="RANDOM_LRU"/>
<property name="metricsEnabled" value="true"/>
<property name="initialSize" value="21474836480"/>
<property name="maxSize" value="21474836480"/>
</bean>
</list>
</property>
<property name="pageSize" value="4096"/>
<property name="concurrencyLevel" value="8"/>
</bean>
</property>
For what reason did Deadlock occur? Is there an option or usage pattern to
solve this?
I think it is due to the client's topology changes. If so, how would you
handle it?
Please let me know if you have any additional questions.
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/