Hi! How many nodes do you have? How many CPUs? Could you provide a code of com.company.node.ignite.IgnitePortionDequeuer or at least a dequeuePortions method?
Evgenii 2017-10-20 0:24 GMT+03:00 smurphy <smur...@trustwave.com>: > threaddump.tdump > <http://apache-ignite-users.70518.x6.nabble.com/file/ > t1317/threaddump.tdump> > > I am using Ignite v2.1 and my code using Optimistic/Serializable > transactions and is locking up. When it does, there are a lot of `WARNING: > Found long running transaction` and `WARNING: Found long running cache > future` messages in the logs (see below). Eventually the grid freezes and > there are a lot of "IgniteInterruptedException: Got interrupted while > waiting for future to complete." (see below). Also, see attached thread > dump > when the grid is finally stopped. > > Using visor, I could see the active threads increase until all are > consumed. > I boosted the thread pool count to 100 and saw the active threads rise > steadily until all were consumed. At theat point, the grid freezes up. > > Using visor ping multiple times, the servers always ping successfully but > the client fails intermittently. > > LONG RUNNING TRANSACTION: > > 2017-10-19 13:03:06,698 WARN [ ] grid-timeout-worker-#15%null% > {Log4JLogger.java:480} - Found long running transaction > [startTime=12:30:33.985, curTime=13:03:06.682, tx=GridNearTxLocal > [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping > [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126, > val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group], > hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey > [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl > [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true], > cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP, > val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple > [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]], > ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, > dhtVer=null, filters=[], filtersPassed=false, filtersSet=true, > entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry > [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126, > val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group], > hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion > [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214, > extras=null, flags=0]]], prepared=0, locked=false, > nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false, > expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, > serReadVer=null, xidVer=GridCacheVersion [topVer=119473256, > order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null, > last=false, nearEntries=0, clientFirst=false, > node=36b4d422-d011-4f77-919a-b8ffb089614b]], nearLocallyMapped=false, > colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false, > thread=<failed to find active thread 109>, > mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping > [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126, > val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group], > hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey > [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl > [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true], > cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP, > val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple > [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]], > ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, > dhtVer=null, filters=[], filtersPassed=false, filtersSet=true, > entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry > [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126, > val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group], > hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion > [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214, > extras=null, flags=0]]], prepared=0, locked=false, > nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false, > expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, > serReadVer=null, xidVer=GridCacheVersion [topVer=119473256, > order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null, > last=false, nearEntries=0, clientFirst=false, > node=36b4d422-d011-4f77-919a-b8ffb089614b]], super=GridDhtTxLocalAdapter > [nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[], > explicitLock=false, > super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, > depEnabled=false, txState=IgniteTxImplicitSingleStateImpl [init=true, > recovery=false], super=IgniteTxAdapter [xidVer=GridCacheVersion > [topVer=119473256, order=1508434155230, nodeOrder=2243], writeVer=null, > implicit=true, loc=true, threadId=109, startTime=1508434233985, > nodeId=cb0332bd-1c18-4695-a75f-0d0ed4637b55, startVer=GridCacheVersion > [topVer=119473256, order=1508434155230, nodeOrder=2243], endVer=null, > isolation=READ_COMMITTED, concurrency=OPTIMISTIC, timeout=0, > sysInvalidate=false, sys=true, plc=2, commitVer=GridCacheVersion > [topVer=119473256, order=1508434155230, nodeOrder=2243], finalizing=NONE, > invalidParts=null, state=PREPARING, timedOut=false, > topVer=AffinityTopologyVersion [topVer=2243, minorTopVer=0], > duration=1952707ms, onePhaseCommit=true], size=1]]]] > > LONG RUNNING CACHE FUTURE: > 2017-10-19 13:03:06,709 WARN [ ] grid-timeout-worker-#15%null% > {Log4JLogger.java:480} - Found long running cache future > [startTime=12:30:33.985, curTime=13:03:06.682, fut=GridNearTxFinishFuture > [futId=d7d7ea53f51-2939d9ae-16bb-41f2-a756-32150fde4ecf, > tx=GridNearTxLocal > [mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping > [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126, > val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group], > hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey > [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl > [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true], > cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP, > val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple > [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]], > ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, > dhtVer=null, filters=[], filtersPassed=false, filtersSet=true, > entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry > [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126, > val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group], > hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion > [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214, > extras=null, flags=0]]], prepared=0, locked=false, > nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false, > expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, > serReadVer=null, xidVer=GridCacheVersion [topVer=119473256, > order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null, > last=false, nearEntries=0, clientFirst=false, > node=36b4d422-d011-4f77-919a-b8ffb089614b]], nearLocallyMapped=false, > colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false, > thread=<failed to find active thread 109>, > mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping > [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126, > val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group], > hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey > [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl > [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true], > cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP, > val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple > [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]], > ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, > dhtVer=null, filters=[], filtersPassed=false, filtersSet=true, > entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry > [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126, > val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group], > hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion > [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214, > extras=null, flags=0]]], prepared=0, locked=false, > nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false, > expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, > serReadVer=null, xidVer=GridCacheVersion [topVer=119473256, > order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null, > last=false, nearEntries=0, clientFirst=false, > node=36b4d422-d011-4f77-919a-b8ffb089614b]], super=GridDhtTxLocalAdapter > [nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[], > explicitLock=false, > super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, > depEnabled=false, txState=IgniteTxImplicitSingleStateImpl [init=true, > recovery=false], super=IgniteTxAdapter [xidVer=GridCacheVersion > [topVer=119473256, order=1508434155230, nodeOrder=2243], writeVer=null, > implicit=true, loc=true, threadId=109, startTime=1508434233985, > nodeId=cb0332bd-1c18-4695-a75f-0d0ed4637b55, startVer=GridCacheVersion > [topVer=119473256, order=1508434155230, nodeOrder=2243], endVer=null, > isolation=READ_COMMITTED, concurrency=OPTIMISTIC, timeout=0, > sysInvalidate=false, sys=true, plc=2, commitVer=GridCacheVersion > [topVer=119473256, order=1508434155230, nodeOrder=2243], finalizing=NONE, > invalidParts=null, state=PREPARING, timedOut=false, > topVer=AffinityTopologyVersion [topVer=2243, minorTopVer=0], > duration=1952717ms, onePhaseCommit=true], size=1]]], commit=true, > mappings=IgniteTxMappingsSingleImpl [mapping=GridDistributedTxMapping > [entries=[IgniteTxEntry [key=KeyCacheObjectImpl [part=126, > val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group], > hasValBytes=true], cacheId=1481046058, txKey=IgniteTxKey > [key=KeyCacheObjectImpl [part=126, val=GridCacheInternalKeyImpl > [name=SCAN_IDX, grpName=default-ds-group], hasValBytes=true], > cacheId=1481046058], val=[op=TRANSFORM, val=null], prevVal=[op=NOOP, > val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=[IgniteBiTuple > [val1=GetAndIncrementProcessor [], val2=[Ljava.lang.Object;@49d9dc6c]], > ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, > dhtVer=null, filters=[], filtersPassed=false, filtersSet=true, > entry=GridDhtDetachedCacheEntry [super=GridDistributedCacheEntry > [super=GridCacheMapEntry [key=KeyCacheObjectImpl [part=126, > val=GridCacheInternalKeyImpl [name=SCAN_IDX, grpName=default-ds-group], > hasValBytes=true], val=null, startVer=1508434155232, ver=GridCacheVersion > [topVer=119473256, order=1508434155232, nodeOrder=2243], hash=-375255214, > extras=null, flags=0]]], prepared=0, locked=false, > nodeId=36b4d422-d011-4f77-919a-b8ffb089614b, locMapped=false, > expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, > serReadVer=null, xidVer=GridCacheVersion [topVer=119473256, > order=1508434155230, nodeOrder=2243]]], explicitLock=false, dhtVer=null, > last=false, nearEntries=0, clientFirst=false, > node=36b4d422-d011-4f77-919a-b8ffb089614b]], trackable=true, > finishOnePhaseCalled=false, innerFuts=[], super=GridCompoundIdentityFuture > [super=GridCompoundFuture [rdc=AlwaysTrueReducer [], initFlag=0, > lsnrCalls=0, done=false, cancelled=false, err=null, futs=[]]]]] > > IGNITECHECKEDEXCEPTION: > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: SEVERE: Failed to execute > job [jobId=ac08ea53f51-cb0332bd-1c18-4695-a75f-0d0ed4637b55, > ses=GridJobSessionImpl [ses=GridTaskSessionImpl > [taskName=com.company.fragment.node.callable.DequeuePortionsCallable, > dep=GridDeployment [ts=1508433959490, depMode=SHARED, > clsLdr=sun.misc.Launcher$AppClassLoader@764c12b6, > clsLdrId=fa68ba53f51-85f1f563-695c-43f9-a263-e9f67e8925b7, userVer=0, > loc=true, > sampleClsName=o.a.i.i.processors.task.GridTaskProcessor$ > TaskDiscoveryListener$1, > pendingUndeploy=false, undeployed=false, usage=64], > taskClsName=com.company.fragment.node.callable.DequeuePortionsCallable, > sesId=9c08ea53f51-cb0332bd-1c18-4695-a75f-0d0ed4637b55, > startTime=1508434417976, endTime=9223372036854775807, > taskNodeId=cb0332bd-1c18-4695-a75f-0d0ed4637b55, > clsLdr=sun.misc.Launcher$AppClassLoader@764c12b6, closed=false, > cpSpi=null, > failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, > subjId=cb0332bd-1c18-4695-a75f-0d0ed4637b55, mapFut=IgniteFuture > [orig=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, > hash=2051666823]], execName=null], > jobId=ac08ea53f51-cb0332bd-1c18-4695-a75f-0d0ed4637b55]] > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: class > org.apache.ignite.IgniteException: class > org.apache.ignite.IgniteInterruptedException: Got interrupted while > waiting > for future to complete. > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2. > execute(GridClosureProcessor.java:1850) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.processors.job.GridJobWorker$ > 2.call(GridJobWorker.java:566) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils. > java:6608) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.processors.job.GridJobWorker. > execute0(GridJobWorker.java:560) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.processors.job.GridJobWorker. > body(GridJobWorker.java:489) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.processors.job.GridJobProcessor. > processJobExecuteRequest(GridJobProcessor.java:1181) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.processors.job.GridJobProcessor$ > JobExecutionListener.onMessage(GridJobProcessor.java:1908) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.managers.communication. > GridIoManager.invokeListener(GridIoManager.java:1556) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.managers.communication.GridIoManager. > processRegularMessage0(GridIoManager.java:1184) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.managers.communication. > GridIoManager.access$4200(GridIoManager.java:126) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.managers.communication.GridIoManager$9.run( > GridIoManager.java:1097) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > java.lang.Thread.run(Unknown Source) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: Caused by: > javax.cache.CacheException: class > org.apache.ignite.IgniteInterruptedException: Got interrupted while > waiting > for future to complete. > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.processors.cache.GridCacheUtils. > convertToCacheException(GridCacheUtils.java:1312) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.processors.cache.IgniteCacheProxy. > cacheException(IgniteCacheProxy.java:2630) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.processors.cache.IgniteCacheProxy.get( > IgniteCacheProxy.java:1188) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > com.company.fragment.node.ignite.IgnitePortionDequeuer.dequeuePortionIds( > IgnitePortionDequeuer.java:276) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > com.company.fragment.node.ignite.IgnitePortionDequeuer._dequeuePortionIds( > IgnitePortionDequeuer.java:196) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > com.company.fragment.node.ignite.IgnitePortionDequeuer.dequeuePortions( > IgnitePortionDequeuer.java:146) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > com.company.fragment.node.callable.DequeuePortionsCallable.call( > DequeuePortionsCallable.java:44) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > com.company.fragment.node.callable.DequeuePortionsCallable.call( > DequeuePortionsCallable.java:35) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2. > execute(GridClosureProcessor.java:1847) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: ... 14 more > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: Caused by: class > org.apache.ignite.IgniteInterruptedException: Got interrupted while > waiting > for future to complete. > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.util.IgniteUtils$3.apply(IgniteUtils.java:796) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: at > org.apache.ignite.internal.util.IgniteUtils$3.apply(IgniteUtils.java:794) > Oct 19 13:29:22 prod-srvr apache-ignite[46822]: ... 23 more > > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >