We've run into a problem recently where it appears our cache is deadlocking
during loading. What I mean by "loading" is that we start up a new cluster
in AWS, unconnected to any existing cluster, and then shove a bunch of data
into it from Kafka. During this process it's not taking any significant
traffic - just healthchecks, ingesting data, and me clicking around in it. 

We've had several deployments in a row fail, apparently due to deadlocking
in the loading process. We're typically seeing a number of threads blocked
with stacktraces like this:

"data-streamer-stripe-3-#20" id=124 state=WAITING
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
    at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
    at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.invoke(GridDhtAtomicCache.java:786)
    at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1359)
    at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1405)
    at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.invoke(GatewayProtectedCacheProxy.java:1362)
    at
com.mycompany.myapp.myPackage.dao.ignite.cache.streamer.VersionCheckingStreamReceiver.receive(VersionCheckingStreamReceiver.java:33)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:137)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:397)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:302)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:59)
    at
org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:89)
    at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
    at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
    at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
    at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
    at
org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:505)
    at java.lang.Thread.run(Thread.java:748)


The machines seem to go into a moderate-CPU loop (~70% usage). My best guess
is all that is going to threads like this:

"exchange-worker-#62" id=177 state=RUNNABLE
    at
org.apache.ignite.internal.util.tostring.SBLimitedLength.toString(SBLimitedLength.java:283)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1012)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:826)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:783)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicAbstractUpdateFuture.toString(GridDhtAtomicAbstractUpdateFuture.java:588)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicSingleUpdateFuture.toString(GridDhtAtomicSingleUpdateFuture.java:134)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at java.util.AbstractCollection.toString(AbstractCollection.java:462)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at
org.apache.ignite.internal.processors.cache.CacheObjectsReleaseFuture.toString(CacheObjectsReleaseFuture.java:58)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at java.util.AbstractCollection.toString(AbstractCollection.java:462)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at
org.apache.ignite.internal.processors.cache.CacheObjectsReleaseFuture.toString(CacheObjectsReleaseFuture.java:58)
    at java.lang.String.valueOf(String.java:2994)
    at
org.apache.ignite.internal.util.GridStringBuilder.a(GridStringBuilder.java:101)
    at
org.apache.ignite.internal.util.tostring.SBLimitedLength.a(SBLimitedLength.java:88)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:939)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toStringImpl(GridToStringBuilder.java:1005)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:685)
    at
org.apache.ignite.internal.util.tostring.GridToStringBuilder.toString(GridToStringBuilder.java:621)
    at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.toString(GridDhtPartitionsExchangeFuture.java:3555)
    at java.lang.String.valueOf(String.java:2994)
    at java.lang.StringBuilder.append(StringBuilder.java:131)
    at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.dumpDebugInfo(GridCachePartitionExchangeManager.java:1569)
    at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2359)
    at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
    at java.lang.Thread.run(Thread.java:748)



I've seen elsewhere that putAll()/getAll() could cause deadlocks, but we're
not using those. I don't believe slow network is the problem. What else can
I look at or try to resolve this? Are we just throwing data into the caches
too fast? Could a weird pattern in the data (eg, large entities) cause this? 

I've attached a full thread dump in case that helps.

Thanks in advance,
BKR

IgniteStackTrace_redacted.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t1824/IgniteStackTrace_redacted.txt>
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to