Re: Ignite 2.0.0 GridUnsafe unmonitor

김성진 Sat, 21 Oct 2017 06:51:06 -0700

I think I'm talking to myself and giving an answer. lol

Maybe this is the issue.


I am doing putAllAsync based on HashMap structure. I hope that it will be
fine to replace this part with TreeMap.

Reference:
http://apache-ignite-users.70518.x6.nabble.com/putAll-stoping-at-600-entries-td817.html

I will try and attach a thread again when problems occur.

Thanks a lot.

2017-10-21 22:05 GMT+09:00 김성진 <[email protected]>:

> A similar issue has re-emerged. When I looked at Stackoverflow, there was
> a user similar to me. https://stackoverflow.com/
> questions/45028962/possible-starvation-in-striped-pool-
> with-deadlock-true-apache-ignite
>
> To summarize, I am sending a random value of a pattern like
> Timestamp_a.b.c to the key of Map at putAllAsync, about 500 times at a
> time. Do you have to send this part after sorting with key value?
>
> 2017-10-21 21:57 GMT+09:00 김성진 <[email protected]>:
>
>> Additionally, Client use the cache.putAllAsync () call.
>>
>> If you look at the Ignite log, you see a method call like
>> updateAllAsyncInternal0.
>> At the same time, does the client have a lock issue when it
>> asynchronously calls after sending a cache entry? :(
>>
>> 2017-10-21 21:06 GMT+09:00 dark <[email protected]>:
>>
>>> Hello, I'm using Ignite 2.0.0, and I would like to ask if you have any
>>> doubts
>>> about the deadlock.
>>> The first use pattern is to create a new cache time unit, and after a
>>> certain period of time, it will perform Destroy.
>>>
>>> Example)
>>>
>>> We create a cache that keeps the data of the 3-minute cycle as shown
>>> below
>>>
>>> [00:00_Cache] [00:01_Cache] [00:02_Cache]
>>>
>>> After one minute, create a new cache [00: 03_Cache] and clear old cache
>>> [00:
>>> 00_Cache].
>>>
>>> [00:00_Cache] is destroy!
>>> [00:03_Cache] is new!
>>>
>>> below current cache list
>>> [00:01_Cache] [00:02_Cache] [00:03_Cache]
>>>
>>> The reason for using this is to remove the data of a certain time period
>>> quickly rather than the expiry of Cache. As a result of eye observation,
>>> it
>>> was possible to quickly remove data in the time zone without using a lot
>>> of
>>> CPU.
>>> In this state, I kept it for about 5 hours, and then I took down 5 Client
>>> nodes that existed in Topology for a while and then uploaded them again.
>>> Then, about ten minutes later, a deadlock occurred with the following
>>> message.
>>>
>>> [19:48:51,290][WARN ][grid-timeout-worker-#15%null%][G] >>> Possible
>>> starvation in striped pool.
>>>     Thread name: sys-stripe-3-#4%null%
>>>     Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE,
>>> topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
>>> msg=GridDhtAtomicUpdateRequest [keys=[KeyCacheObjectImpl [part=179, val
>>>     Deadlock: true
>>>     Completed: 1054320
>>> Thread [name="sys-stripe-3-#4%null%", id=21, state=BLOCKED,
>>> blockCnt=5364,
>>> waitCnt=1261740]
>>>     Lock
>>> [object=o.a.i.i.processors.cache.distributed.dht.atomic.Grid
>>> DhtAtomicCacheEntry@6c7a9d31,
>>> ownerName=sys-stripe-6-#7%null%, ownerId=24]
>>>         at
>>> o.a.i.i.processors.cache.GridCacheMapEntry.markObsoleteIfEmp
>>> ty(GridCacheMapEntry.java:2095)
>>>         at
>>> o.a.i.i.processors.cache.CacheOffheapEvictionManager.touch(C
>>> acheOffheapEvictionManager.java:44)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache.unlockEntries(GridDhtAtomicCache.java:2896)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1853)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3016)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache.access$400(GridDhtAtomicCache.java:127)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache$6.apply(GridDhtAtomicCache.java:282)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache$6.apply(GridDhtAtomicCache.java:277)
>>>         at
>>> o.a.i.i.processors.cache.GridCacheIoManager.processMessage(G
>>> ridCacheIoManager.java:863)
>>>         at
>>> o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridC
>>> acheIoManager.java:386)
>>>         at
>>> o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(Gr
>>> idCacheIoManager.java:308)
>>>         at
>>> o.a.i.i.processors.cache.GridCacheIoManager.access$000(GridC
>>> acheIoManager.java:100)
>>>         at
>>> o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(Grid
>>> CacheIoManager.java:253)
>>>         at
>>> o.a.i.i.managers.communication.GridIoManager.invokeListener(
>>> GridIoManager.java:1257)
>>>         at
>>> o.a.i.i.managers.communication.GridIoManager.processRegularM
>>> essage0(GridIoManager.java:885)
>>>         at
>>> o.a.i.i.managers.communication.GridIoManager.access$2100(Gri
>>> dIoManager.java:114)
>>>         at
>>> o.a.i.i.managers.communication.GridIoManager$7.run(GridIoMan
>>> ager.java:802)
>>>         at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java
>>> :483)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>> [19:48:51,423][WARN ][grid-timeout-worker-#15%null%][G] >>> Possible
>>> starvation in striped pool.
>>>     Thread name: sys-stripe-5-#6%null%
>>>     Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE,
>>> topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
>>> msg=GridDhtAtomicUpdateRequest [keys=[KeyCacheObjectImpl [part=541, val
>>>     Deadlock: true
>>>     Completed: 932925
>>> Thread [name="sys-stripe-5-#6%null%", id=23, state=BLOCKED,
>>> blockCnt=5629,
>>> waitCnt=1137576]
>>>     Lock
>>> [object=o.a.i.i.processors.cache.distributed.dht.atomic.Grid
>>> DhtAtomicCacheEntry@449f1914,
>>> ownerName=sys-stripe-6-#7%null%, ownerId=24]
>>>         at sun.misc.Unsafe.monitorEnter(Native Method)
>>>         at o.a.i.i.util.GridUnsafe.monitorEnter(GridUnsafe.java:1193)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache.lockEntries(GridDhtAtomicCache.java:2815)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1741)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1630)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3016)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache.access$400(GridDhtAtomicCache.java:127)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache$6.apply(GridDhtAtomicCache.java:282)
>>>         at
>>> o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomi
>>> cCache$6.apply(GridDhtAtomicCache.java:277)
>>>         at
>>> o.a.i.i.processors.cache.GridCacheIoManager.processMessage(G
>>> ridCacheIoManager.java:863)
>>>         at
>>> o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridC
>>> acheIoManager.java:386)
>>>         at
>>> o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(Gr
>>> idCacheIoManager.java:308)
>>>         at
>>> o.a.i.i.processors.cache.GridCacheIoManager.access$000(GridC
>>> acheIoManager.java:100)
>>>         at
>>> o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(Grid
>>> CacheIoManager.java:253)
>>>         at
>>> o.a.i.i.managers.communication.GridIoManager.invokeListener(
>>> GridIoManager.java:1257)
>>>         at
>>> o.a.i.i.managers.communication.GridIoManager.processRegularM
>>> essage0(GridIoManager.java:885)
>>>         at
>>> o.a.i.i.managers.communication.GridIoManager.access$2100(Gri
>>> dIoManager.java:114)
>>>         at
>>> o.a.i.i.managers.communication.GridIoManager$7.run(GridIoMan
>>> ager.java:802)
>>>         at o.a.i.i.util.StripedExecutor$Stripe.run(StripedExecutor.java
>>> :483)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>> Deadlock jmc picture
>>> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/i
>>> gnite-deadlock-1.ignite-deadlock-1>
>>> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/i
>>> gnite-deadlock-2.ignite-deadlock-2>
>>> <http://apache-ignite-users.70518.x6.nabble.com/file/t1415/i
>>> gnite-deadlock-3.png>
>>>
>>> As you can see in the picture above, we can see that sys-stripe-5 and
>>> sys-stripe-6 are the owner of the thread. Besides Ignite Cache
>>> Configuration
>>> is shown below.
>>>
>>> return ignite.getOrCreateCache(new CacheConfiguration<String,
>>> RollupMetric>()
>>>             .setName(cacheName)
>>>             .setCacheMode(CacheMode.PARTITIONED)
>>>             .setAtomicityMode(CacheAtomicityMode.ATOMIC)
>>>             .setRebalanceMode(CacheRebalanceMode.ASYNC)
>>>             .setMemoryPolicyName(MEMORY_POLICY_NAME)
>>>             .setBackups(1)
>>>             .setStatisticsEnabled(true)
>>>             .setManagementEnabled(true)
>>>             .setCopyOnRead(false)
>>>             .setQueryParallelism(20)
>>>             .setLongQueryWarningTimeout(10000) // 10s
>>>             .setEagerTtl(false)
>>>             .setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(new
>>> Duration(TimeUnit.DAYS, 365)))
>>>
>>> .setMaxConcurrentAsyncOperations(CacheConfiguration.DFLT_MAX
>>> _CONCURRENT_ASYNC_OPS
>>> * 10)
>>>             .setAffinity(new CoupangAffinityFunction())
>>>             .setIndexedTypes(String.class, RollupMetric.class));
>>>
>>> The reason for setting the CacheExpiryPolicy to 1 year above is because
>>> the
>>> entry is evicted by clearing the cache as described previously.
>>>
>>> Ignite Memory Configuration
>>> <property name="memoryConfiguration">
>>>       <bean class="org.apache.ignite.configuration.MemoryConfiguration">
>>>
>>>         <property name="memoryPolicies">
>>>           <list>
>>>             <bean
>>> class="org.apache.ignite.configuration.MemoryPolicyConfiguration">
>>>               <property name="name" value="RollupMemory"/>
>>>
>>>               <property name="pageEvictionMode" value="RANDOM_LRU"/>
>>>               <property name="metricsEnabled" value="true"/>
>>>
>>>               <property name="initialSize" value="21474836480"/>
>>>
>>>               <property name="maxSize" value="21474836480"/>
>>>             </bean>
>>>           </list>
>>>         </property>
>>>         <property name="pageSize" value="4096"/>
>>>         <property name="concurrencyLevel" value="8"/>
>>>       </bean>
>>>     </property>
>>>
>>> For what reason did Deadlock occur? Is there an option or usage pattern
>>> to
>>> solve this?
>>>
>>> I think it is due to the client's topology changes. If so, how would you
>>> handle it?
>>>
>>> Please let me know if you have any additional questions.
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>
>>
>

Re: Ignite 2.0.0 GridUnsafe unmonitor

Reply via email to