Vipul Thakur created IGNITE-13298:
-------------------------------------

             Summary: Found long running cache at client end 
                 Key: IGNITE-13298
                 URL: https://issues.apache.org/jira/browse/IGNITE-13298
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 2.7.6
         Environment: ========cluster memory config/persistence================ 

<property name="gridLogger"> <property name="gridLogger">            <bean 
class="org.apache.ignite.logger.log4j2.Log4J2Logger">                
<constructor-arg type="java.lang.String" 
value="${IGNITE_SCRIPT}/ignite-log4j2.xml" />            </bean>        
</property>        <property name="dataStorageConfiguration">            <bean 
class="org.apache.ignite.configuration.DataStorageConfiguration">               
 <property name="defaultDataRegionConfiguration">                    <bean 
class="org.apache.ignite.configuration.DataRegionConfiguration">                
        <property name="metricsEnabled" value="true"/>                          
  <property name="persistenceEnabled" value="true" />                        
<!--<property name="maxSize" value="#\{10L * 1024 * 1024 * 1024}"/> -->         
               <property name="maxSize" value="400Gb" />                        
<!-- Increasing the buffer size to 4 GB. -->                        <property 
name="checkpointPageBufferSize" value="${checkpointPageBufferSize}" />          
          </bean>                </property>                <property 
name="storagePath" value="${storagePath}" />                <property 
name="walPath" value="${walPath}" />                <property 
name="walArchivePath" value="${walArchivePath}" />                <property 
name="walMode" value="LOG_ONLY" />                <property name="pageSize" 
value="${pageSize}" />                 <!-- Enable write throttling. -->        
        <property name="writeThrottlingEnabled" value="true" />                
<property name="walHistorySize" value="1" />                <property 
name="metricsEnabled" value="true"/>            </bean>        </property>

==================Client thread dump ===========================

2020-07-20 12:14:432020-07-20 12:14:43Full thread dump Java HotSpot(TM) 64-Bit 
Server VM (25.211-b12 mixed mode):
"Attach Listener" #788 daemon prio=9 os_prio=0 tid=0x00007fe7f4001000 nid=0x32d 
waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"Context_6_jms_314_ConsumerDispatcher" #787 daemon prio=5 os_prio=0 
tid=0x00007fe6e805e000 nid=0x31a waiting on condition [0x00007fe2e5bdd000]   
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) - parking to wait for  <0x00000000cb87d9d0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) 
at 
com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110)
 at 
com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130)
 at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"DefaultMessageListenerContainer-35" #786 prio=5 os_prio=0 
tid=0x00007fe460013800 nid=0x319 in Object.wait() [0x00007fe2e5cde000]   
java.lang.Thread.State: TIMED_WAITING (on object monitor) at 
java.lang.Object.wait(Native Method) at 
com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) 
at 
com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845)
 - locked <0x00000000cb8cce50> (a 
com.solacesystems.jcsmp.impl.XMLMessageQueueList) at 
com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) 
at 
org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86)
 at 
org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076)
 at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"Context_4_jms_313_ConsumerDispatcher" #785 daemon prio=5 os_prio=0 
tid=0x00007fe6f8028000 nid=0x318 waiting on condition [0x00007fe2e5ddf000]   
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) - parking to wait for  <0x00000000cb8cf8d0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) 
at 
com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110)
 at 
com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130)
 at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"DefaultMessageListenerContainer-27" #784 prio=5 os_prio=0 
tid=0x00007fe45800f800 nid=0x317 in Object.wait() [0x00007fe2e5ee0000]   
java.lang.Thread.State: TIMED_WAITING (on object monitor) at 
java.lang.Object.wait(Native Method) at 
com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) 
at 
com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845)
 - locked <0x00000000cb8cffc8> (a 
com.solacesystems.jcsmp.impl.XMLMessageQueueList) at 
com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) 
at 
org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86)
 at 
org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076)
 at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"Context_6_jms_312_ConsumerDispatcher" #780 daemon prio=5 os_prio=0 
tid=0x00007fe6e805c800 nid=0x313 waiting on condition [0x00007fe2e62e4000]   
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) - parking to wait for  <0x00000000cb751ad0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) 
at 
com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110)
 at 
com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130)
 at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"DefaultMessageListenerContainer-34" #779 prio=5 os_prio=0 
tid=0x00007fe450003800 nid=0x312 waiting on condition [0x00007fe2e63e5000]   
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native 
Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
 at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get0(GridCacheAdapter.java:4723)
 at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4697)
 at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1415)
 at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:928)
 at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:640)
 at 
com.jio.digitalapi.cacheservice.client.impl.DigitalApiIgniteCache.get(DigitalApiIgniteCache.java:87)
 at 
com.jio.digitalapi.eventprocessing.service.dataservice.EventManagementApacheIgniteDataService.getCustomerEntity(EventManagementApacheIgniteDataService.java:101)
 at 
com.jio.digitalapi.ep.dataservice.impl.AbstractEventManagementDataService.getCustomer(AbstractEventManagementDataService.java:38)
 at 
com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventActionProcessor.getCustomerEntity(AbstractMessageEventActionProcessor.java:154)
 at 
com.jio.digitalapi.eventprocessing.service.event.action.processor.PrimeMemberUpdateEventActionProcessor.processEvent(PrimeMemberUpdateEventActionProcessor.java:31)
 at 
com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventActionProcessor.processMessageEvent(AbstractMessageEventActionProcessor.java:112)
 at 
com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventHandler.processMessage(AbstractMessageEventHandler.java:66)
 at 
com.jio.digitalapi.platform.core.messaging.jms.receiver.DigitalApiAsyncJmsMessageReceiver.onMessage(DigitalApiAsyncJmsMessageReceiver.java:106)
 at 
org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:761)
 at 
org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:699)
 at 
org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:674)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:318)
 at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179)
 at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076)
 at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None

 

 

 

 
            Reporter: Vipul Thakur
         Attachments: Ignite_10.143.75.24_threaddump.txt, 
Ignite_10.143.75.24_threaddump_1.txt, Ignite_10.143.75.24_threaddump_2.txt, 
Ignite_10.143.75.24_threaddump_3.txt, Ignite_10.143.75.24_threaddump_4.txt

Hi 

We have a ignite cluster with four nodes.(each having a memory of 400Gb). 

After deploying the cluster and clients in an environment after nearly 2 months 
our cluster gets hung up, initially few clients get stuck with some processing 
pending, and then after some duration everything gets stuck.

So after restarting everything(clients and cluster both) it works fine(process 
around 1 crore of records in 10-15 minutes involving creation of data and even 
updating the data.

 

We are using transactions for all the caches, for create and update and no 
transaction for get calls.

We have already faced this issue twice in a span of 4-5 months of deployment.

I am attaching the cluster thread dump and client thread dump.

I have seen the *Found long running caches* with one ticket already in Jira and 
moved to 2.8.1, so is that the solution(please confirm).

    2020-06-04 20:05:55.889 WARN 1 --- [c7fd8b84-d8sdl%] 
org.apache.ignite.internal.diagnostic : Found long running cache future 
[startTime=19:59:30.288, curTime=20:05:55.882, 
fut=GridPartitionedSingleGetFuture [topVer=AffinityTopologyVersion [topVer=10, 
minorTopVer=0], key=UserKeyCacheObjectImpl [part=105, val=7701112105, 
hasValBytes=true], readThrough=true, forcePrimary=false, 
futId=681978f7271-55010ba8-d8d5-475f-97be-ff1c1916cea1, trackable=true, 
subjId=59d5e3cf-d09c-44d3-82d6-84dd35b64e10, taskName=null, 
deserializeBinary=true, skipVals=false, expiryPlc=null, canRemap=true, 
needVer=false, keepCacheObjects=false, recovery=false, node=TcpDiscoveryNode 
[id=14718baa-35e7-4d61-bde8-1e9c61978e8f, addrs=[10.135.34.67, 127.0.0.1], 
sockAddrs=[/10.135.34.67:47500, /127.0.0.1:47500], discPort=47500, order=1, 
intOrder=1, lastExchangeTime=1591277613188, loc=false, 
ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], postProcessingClos=null]]

 

But this issue i have observed to come up in our scenario in environment also 
coming without any load or huge traffic.(my cluster just had 100 mb data).

 

We dont have any transaction timeout set as of now , should we go for that.

 

 

Thanks 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to