Hi,

Yes, you’ve already grasped how to fix the deadlock - feed the keys sorted
in TreeMap to bulk operations such as putAll or removeAll. Keys in HashMap
are unordered which leads to a deadlock if there are multiple bulk updates
running in parallel.

Furthermore, you might consider ‘eagerTtl’ parameter for the eviction
policy instead of the custom code. That parameter instructs to remove stale
items proactively.

 Lastly, upgrade to version  2.2, it’s much more stable than 2.0.

Denis

On Saturday, October 21, 2017, 김성진 <[email protected]> wrote:

> I think I'm talking to myself and giving an answer. lol
>
> Maybe this is the issue.
>
> I am doing putAllAsync based on HashMap structure. I hope that it will be
> fine to replace this part with TreeMap.
>
> Reference: http://apache-ignite-users.70518.x6.nabble.com/putAll-
> stoping-at-600-entries-td817.html
>
> I will try and attach a thread again when problems occur.
>
> Thanks a lot.
>
> 2017-10-21 22:05 GMT+09:00 김성진 <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>>:
>
>> A similar issue has re-emerged. When I looked at Stackoverflow, there was
>> a user similar to me. https://stackoverflow.com/ques
>> tions/45028962/possible-starvation-in-striped-pool-with-
>> deadlock-true-apache-ignite
>>
>> To summarize, I am sending a random value of a pattern like
>> Timestamp_a.b.c to the key of Map at putAllAsync, about 500 times at a
>> time. Do you have to send this part after sorting with key value?
>>
>> 2017-10-21 21:57 GMT+09:00 김성진 <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>:
>>
>>> Additionally, Client use the cache.putAllAsync () call.
>>>
>>> If you look at the Ignite log, you see a method call like
>>> updateAllAsyncInternal0.
>>> At the same time, does the client have a lock issue when it
>>> asynchronously calls after sending a cache entry? :(
>>>
>>> 2017-10-21 21:06 GMT+09:00 dark <[email protected]
>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>:
>>>
>>>> Hello, I'm using Ignite 2.0.0, and I would like to ask if you have any
>>>> doubts
>>>> about the deadlock.
>>>> The first use pattern is to create a new cache time unit, and after a
>>>> certain period of time, it will perform Destroy.
>>>>
>>>> Example)
>>>>
>>>> We create a cache that keeps the data of the 3-minute cycle as shown
>>>> below
>>>>
>>>> [00:00_Cache] [00:01_Cache] [00:02_Cache]
>>>>
>>>> After one minute, create a new cache [00: 03_Cache] and clear old cache
>>>> [00:
>>>> 00_Cache].
>>>>
>>>> [00:00_Cache] is destroy!
>>>> [00:03_Cache] is new!
>>>>
>>>> below current cache list
>>>> [00:01_Cache] [00:02_Cache] [00:03_Cache]
>>>>
>>>> The reason for using this is to remove the data of a certain time period
>>>> quickly rather than the expiry of Cache. As a result of eye
>>>> observation, it
>>>> was possible to quickly remove data in the time zone without using a
>>>> lot of
>>>> CPU.
>>>> In this state, I kept it for about 5 hours, and then I took down 5
>>>> Client
>>>> nodes that existed in Topology for a while and then uploaded them again.
>>>> Then, about ten minutes later, a deadlock occurred with the following
>>>> message.
>>>>
>>>> [19:48:51,290][WARN ][grid-timeout-worker-#15%null%][G] >>> Possible
>>>> starvation in striped pool.
>>>>     Thread name: sys-stripe-3-#4%null%
>>>>     Queue: [Message closure [msg=GridIoMessage [plc=2,
>>>> topic=TOPIC_CACHE,
>>>> topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
>>>> msg=GridDhtAtomicUpdateRequest [keys=[KeyCacheObjectImpl [part=179, val
>>>>     Deadlock: true
>>>>     Completed: 1054320
>>>> Thread [name="sys-stripe-3-#4%null%", id=21, state=BLOCKED,
>>>> blockCnt=5364,
>>>> waitCnt=1261740]
>>>>     Lock
>>>> [object=o.a.i.i.processors.cache.distributed.dht.atomic.Grid
>>>> DhtAtomicCacheEntry@6c7a9d31,
>>>> ownerName=sys-stripe-6-#7%null%, ownerId=24]
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheMapEntry.markObsoleteIfEmp
>>>> ty(GridCacheMapEntry.java:2095)
>>>>         at
>>>> o.a.i.i.processors.cache.CacheOffheapEvictionManager.touch(C
>>>> acheOffheapEvictionManager.java:44)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache.unlockEntries(GridDhtAtomicCache.java:2896)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1853)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3016)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache.access$400(GridDhtAtomicCache.java:127)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache$6.apply(GridDhtAtomicCache.java:282)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache$6.apply(GridDhtAtomicCache.java:277)
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheIoManager.processMessage(G
>>>> ridCacheIoManager.java:863)
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridC
>>>> acheIoManager.java:386)
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(Gr
>>>> idCacheIoManager.java:308)
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheIoManager.access$000(GridC
>>>> acheIoManager.java:100)
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(Grid
>>>> CacheIoManager.java:253)
>>>>         at
>>>> o.a.i.i.managers.communication.GridIoManager.invokeListener(
>>>> GridIoManager.java:1257)
>>>>         at
>>>> o.a.i.i.managers.communication.GridIoManager.processRegularM
>>>> essage0(GridIoManager.java:885)
>>>>         at
>>>> o.a.i.i.managers.communication.GridIoManager.access$2100(Gri
>>>> dIoManager.java:114)
>>>>         at
>>>> o.a.i.i.managers.communication.GridIoManager$7.run(GridIoMan
>>>> ager.java:802)
>>>>         at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java
>>>> :483)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> [19:48:51,423][WARN ][grid-timeout-worker-#15%null%][G] >>> Possible
>>>> starvation in striped pool.
>>>>     Thread name: sys-stripe-5-#6%null%
>>>>     Queue: [Message closure [msg=GridIoMessage [plc=2,
>>>> topic=TOPIC_CACHE,
>>>> topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
>>>> msg=GridDhtAtomicUpdateRequest [keys=[KeyCacheObjectImpl [part=541, val
>>>>     Deadlock: true
>>>>     Completed: 932925
>>>> Thread [name="sys-stripe-5-#6%null%", id=23, state=BLOCKED,
>>>> blockCnt=5629,
>>>> waitCnt=1137576]
>>>>     Lock
>>>> [object=o.a.i.i.processors.cache.distributed.dht.atomic.Grid
>>>> DhtAtomicCacheEntry@449f1914,
>>>> ownerName=sys-stripe-6-#7%null%, ownerId=24]
>>>>         at sun.misc.Unsafe.monitorEnter(Native Method)
>>>>         at o.a.i.i.util.GridUnsafe.monitorEnter(GridUnsafe.java:1193)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache.lockEntries(GridDhtAtomicCache.java:2815)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1741)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3016)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache.access$400(GridDhtAtomicCache.java:127)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache$6.apply(GridDhtAtomicCache.java:282)
>>>>         at
>>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>>> cCache$6.apply(GridDhtAtomicCache.java:277)
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheIoManager.processMessage(G
>>>> ridCacheIoManager.java:863)
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridC
>>>> acheIoManager.java:386)
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(Gr
>>>> idCacheIoManager.java:308)
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheIoManager.access$000(GridC
>>>> acheIoManager.java:100)
>>>>         at
>>>> o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(Grid
>>>> CacheIoManager.java:253)
>>>>         at
>>>> o.a.i.i.managers.communication.GridIoManager.invokeListener(
>>>> GridIoManager.java:1257)
>>>>         at
>>>> o.a.i.i.managers.communication.GridIoManager.processRegularM
>>>> essage0(GridIoManager.java:885)
>>>>         at
>>>> o.a.i.i.managers.communication.GridIoManager.access$2100(Gri
>>>> dIoManager.java:114)
>>>>         at
>>>> o.a.i.i.managers.communication.GridIoManager$7.run(GridIoMan
>>>> ager.java:802)
>>>>         at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java
>>>> :483)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> Deadlock jmc picture
>>>> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/i
>>>> gnite-deadlock-1.ignite-deadlock-1>
>>>> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/i
>>>> gnite-deadlock-2.ignite-deadlock-2>
>>>> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/i
>>>> gnite-deadlock-3.png>
>>>>
>>>> As you can see in the picture above, we can see that sys-stripe-5 and
>>>> sys-stripe-6 are the owner of the thread. Besides Ignite Cache
>>>> Configuration
>>>> is shown below.
>>>>
>>>> return ignite.getOrCreateCache(new CacheConfiguration<String,
>>>> RollupMetric>()
>>>>             .setName(cacheName)
>>>>             .setCacheMode(CacheMode.PARTITIONED)
>>>>             .setAtomicityMode(CacheAtomicityMode.ATOMIC)
>>>>             .setRebalanceMode(CacheRebalanceMode.ASYNC)
>>>>             .setMemoryPolicyName(MEMORY_POLICY_NAME)
>>>>             .setBackups(1)
>>>>             .setStatisticsEnabled(true)
>>>>             .setManagementEnabled(true)
>>>>             .setCopyOnRead(false)
>>>>             .setQueryParallelism(20)
>>>>             .setLongQueryWarningTimeout(10000) // 10s
>>>>             .setEagerTtl(false)
>>>>             .setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(new
>>>> Duration(TimeUnit.DAYS, 365)))
>>>>
>>>> .setMaxConcurrentAsyncOperations(CacheConfiguration.DFLT_MAX
>>>> _CONCURRENT_ASYNC_OPS
>>>> * 10)
>>>>             .setAffinity(new CoupangAffinityFunction())
>>>>             .setIndexedTypes(String.class, RollupMetric.class));
>>>>
>>>> The reason for setting the CacheExpiryPolicy to 1 year above is because
>>>> the
>>>> entry is evicted by clearing the cache as described previously.
>>>>
>>>> Ignite Memory Configuration
>>>> <property name="memoryConfiguration">
>>>>       <bean class="org.apache.ignite.confi
>>>> guration.MemoryConfiguration">
>>>>
>>>>         <property name="memoryPolicies">
>>>>           <list>
>>>>             <bean
>>>> class="org.apache.ignite.configuration.MemoryPolicyConfiguration">
>>>>               <property name="name" value="RollupMemory"/>
>>>>
>>>>               <property name="pageEvictionMode" value="RANDOM_LRU"/>
>>>>               <property name="metricsEnabled" value="true"/>
>>>>
>>>>               <property name="initialSize" value="21474836480"/>
>>>>
>>>>               <property name="maxSize" value="21474836480"/>
>>>>             </bean>
>>>>           </list>
>>>>         </property>
>>>>         <property name="pageSize" value="4096"/>
>>>>         <property name="concurrencyLevel" value="8"/>
>>>>       </bean>
>>>>     </property>
>>>>
>>>> For what reason did Deadlock occur? Is there an option or usage pattern
>>>> to
>>>> solve this?
>>>>
>>>> I think it is due to the client's topology changes. If so, how would you
>>>> handle it?
>>>>
>>>> Please let me know if you have any additional questions.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>>
>>>
>>>
>>
>

Reply via email to