Hi,

I had a problem a few weeks ago with a shared cache stopping - I thought I
had fixed it by adding backups to the cache configuration but it happened
again at the end of last week and we had to restart all nodes on the grid to
restore order. I'd appreciate any pointers on why this is happening and what
I can do to prevent it. Log file extracts are below.  

Thanks
Jon


At 14:34 the cache was rebalanced successfully:

2017-05-19 14:34:19,845 INFO  [exchange-worker-#27%null%] java.JavaLogger
(JavaLogger.java:273) - Cache rebalancing scheduled:
[cache=datastructures_0, waitList=[ignite-sys-cache,
ignite-marshaller-sys-cache, ignite-atomics-sys-cache]]
2017-05-19 14:34:19,860 INFO  [exchange-worker-#27%null%] java.JavaLogger
(JavaLogger.java:273) - Rebalancing required [top=AffinityTopologyVersion
[topVer=251, minorTopVer=0], evt=NODE_FAILED,
node=c64b9b52-3013-4f55-91ec-422045a83412]
2017-05-19 14:34:19,863 INFO  [sys-#406%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Starting rebalancing
[cache=datastructures_0, mode=SYNC,
fromNode=c49dd7a7-ad75-4bdd-96e8-7cbd19c34970, partitionsCount=13,
topology=AffinityTopologyVersion [topVer=251, minorTopVer=0],
updateSeq=26784]
2017-05-19 14:34:19,877 INFO  [sys-#406%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Starting rebalancing
[cache=datastructures_0, mode=SYNC,
fromNode=7b87f6d3-8b9b-461a-b4d4-eef1132b5b1d, partitionsCount=27,
topology=AffinityTopologyVersion [topVer=251, minorTopVer=0],
updateSeq=26784]
2017-05-19 14:34:19,878 INFO  [sys-#406%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Starting rebalancing
[cache=datastructures_0, mode=SYNC,
fromNode=222368cd-1ea5-4247-90be-00343c7f183f, partitionsCount=20,
topology=AffinityTopologyVersion [topVer=251, minorTopVer=0],
updateSeq=26784]
2017-05-19 14:34:19,879 INFO  [sys-#406%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Starting rebalancing
[cache=datastructures_0, mode=SYNC,
fromNode=429e48e2-5500-426d-9849-1b1acdd58e0c, partitionsCount=18,
topology=AffinityTopologyVersion [topVer=251, minorTopVer=0],
updateSeq=26784]
2017-05-19 14:34:20,219 INFO  [sys-#407%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Completed rebalancing
[cache=datastructures_0, fromNode=222368cd-1ea5-4247-90be-00343c7f183f,
topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], time=352 ms]
2017-05-19 14:34:25,818 INFO  [sys-#408%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Completed rebalancing
[cache=datastructures_0, fromNode=c49dd7a7-ad75-4bdd-96e8-7cbd19c34970,
topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], time=5952 ms]
2017-05-19 14:34:29,454 INFO  [sys-#420%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Completed rebalancing
[cache=datastructures_0, fromNode=429e48e2-5500-426d-9849-1b1acdd58e0c,
topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], time=9595 ms]
2017-05-19 14:34:33,054 INFO  [sys-#417%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Completed (final) rebalancing
[cache=datastructures_0, fromNode=7b87f6d3-8b9b-461a-b4d4-eef1132b5b1d,
topology=AffinityTopologyVersion [topVer=251, minorTopVer=0], time=13195 ms]

But then less than an hour later the node failed and the following is an
condensed extract of the log:

2017-05-19 15:22:29,662 WARN  [disco-event-worker-#20%null%] java.JavaLogger
(JavaLogger.java:278) - Node FAILED: TcpDiscoveryNode
[id=c49dd7a7-ad75-4bdd-96e8-7cbd19c34970, addrs=[10.127.246.164, 127.0.0.1],
sockAddrs=[LNAPLN109PRD.ldn.emea.cib/10.127.246.164:47504,
/127.0.0.1:47504], discPort=47504, order=248, intOrder=128,
lastExchangeTime=1495066956338, loc=false, ver=1.8.0#20161205-sha1:9ca40dbe,
isClient=false]
2017-05-19 15:22:29,992 INFO  [disco-event-worker-#20%null%] java.JavaLogger
(JavaLogger.java:273) - Topology snapshot [ver=252, servers=4, clients=0,
CPUs=24, heap=2.6GB]
2017-05-19 15:22:36,403 INFO  [exchange-worker-#27%null%] java.JavaLogger
(JavaLogger.java:273) - Cache rebalancing scheduled:
[cache=datastructures_0, waitList=[ignite-sys-cache,
ignite-marshaller-sys-cache, ignite-atomics-sys-cache]]
2017-05-19 15:22:36,451 INFO  [exchange-worker-#27%null%] java.JavaLogger
(JavaLogger.java:273) - Rebalancing required [top=AffinityTopologyVersion
[topVer=252, minorTopVer=0], evt=NODE_FAILED,
node=c49dd7a7-ad75-4bdd-96e8-7cbd19c34970]
2017-05-19 15:22:36,473 INFO  [sys-#411%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Starting rebalancing
[cache=datastructures_0, mode=SYNC,
fromNode=222368cd-1ea5-4247-90be-00343c7f183f, partitionsCount=34,
topology=AffinityTopologyVersion [topVer=252, minorTopVer=0],
updateSeq=27363]
2017-05-19 15:22:36,514 INFO  [sys-#411%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Starting rebalancing
[cache=datastructures_0, mode=SYNC,
fromNode=7b87f6d3-8b9b-461a-b4d4-eef1132b5b1d, partitionsCount=34,
topology=AffinityTopologyVersion [topVer=252, minorTopVer=0],
updateSeq=27363]
2017-05-19 15:22:36,515 INFO  [sys-#411%null%] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Starting rebalancing
[cache=datastructures_0, mode=SYNC,
fromNode=429e48e2-5500-426d-9849-1b1acdd58e0c, partitionsCount=47,
topology=AffinityTopologyVersion [topVer=252, minorTopVer=0],
updateSeq=27363]
2017-05-19 15:23:06,576 INFO  [tcp-disco-msg-worker-#2%null%]
java.JavaLogger (JavaLogger.java:273) - Local node seems to be disconnected
from topology (failure detection timeout is reached)
[failureDetectionTimeout=10000, connCheckFreq=3333]
2017-05-19 15:23:07,005 WARN  [tcp-disco-msg-worker-#2%null%]
java.JavaLogger (JavaLogger.java:278) - Node is out of topology (probably,
due to short-time network problems).
2017-05-19 15:23:07,049 WARN  [disco-event-worker-#20%null%] java.JavaLogger
(JavaLogger.java:278) - Local node SEGMENTED: TcpDiscoveryNode
[id=bc2a043b-b90a-45d9-9761-962b2e8b0404, addrs=[10.127.246.164, 127.0.0.1],
sockAddrs=[LNAPLN109PRD.ldn.emea.cib/10.127.246.164:47500,
/127.0.0.1:47500], discPort=47500, order=244, intOrder=124,
lastExchangeTime=1495203787004, loc=true, ver=1.8.0#20161205-sha1:9ca40dbe,
isClient=false]
2017-05-19 15:23:07,095 INFO  [grid-timeout-worker-#7%null%] java.JavaLogger
(JavaLogger.java:273) - 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=bc2a043b, name=null, uptime=38:00:32:160]
    ^-- H/N/C [hosts=2, nodes=4, CPUs=24]
    ^-- CPU [cur=31.33%, avg=1.42%, GC=50.87%]
    ^-- Heap [used=158MB, free=65.18%, comm=343MB]
    ^-- Non heap [used=69MB, free=-1%, comm=70MB]
    ^-- Public thread pool [active=0, idle=2, qSize=0]
    ^-- System thread pool [active=3, idle=29, qSize=0]
    ^-- Outbound messages queue [size=0]
2017-05-19 15:23:07,279 WARN  [disco-event-worker-#20%null%] java.JavaLogger
(JavaLogger.java:278) - Stopping local node according to configured
segmentation policy.
2017-05-19 15:23:07,290 WARN  [disco-event-worker-#20%null%] java.JavaLogger
(JavaLogger.java:278) - Node FAILED: TcpDiscoveryNode
[id=222368cd-1ea5-4247-90be-00343c7f183f, addrs=[10.127.246.74, 127.0.0.1],
sockAddrs=[/127.0.0.1:47500, LNAPLN343PRD.ldn.emea.cib/10.127.246.74:47500],
discPort=47500, order=211, intOrder=110, lastExchangeTime=1495066951942,
loc=false, ver=1.8.0#20161205-sha1:9ca40dbe, isClient=false]
2017-05-19 15:23:07,319 INFO  [disco-event-worker-#20%null%] java.JavaLogger
(JavaLogger.java:273) - Topology snapshot [ver=253, servers=3, clients=0,
CPUs=16, heap=1.3GB]
2017-05-19 15:23:07,360 WARN  [disco-event-worker-#20%null%] java.JavaLogger
(JavaLogger.java:278) - Node FAILED: TcpDiscoveryNode
[id=429e48e2-5500-426d-9849-1b1acdd58e0c, addrs=[10.127.246.164, 127.0.0.1],
sockAddrs=[LNAPLN109PRD.ldn.emea.cib/10.127.246.164:47502,
/127.0.0.1:47502], discPort=47502, order=246, intOrder=126,
lastExchangeTime=1495066955707, loc=false, ver=1.8.0#20161205-sha1:9ca40dbe,
isClient=false]
2017-05-19 15:23:07,365 INFO  [disco-event-worker-#20%null%] java.JavaLogger
(JavaLogger.java:273) - Topology snapshot [ver=254, servers=2, clients=0,
CPUs=16, heap=0.89GB]
2017-05-19 15:23:07,394 WARN  [disco-event-worker-#20%null%] java.JavaLogger
(JavaLogger.java:278) - Node FAILED: TcpDiscoveryNode
[id=7b87f6d3-8b9b-461a-b4d4-eef1132b5b1d, addrs=[10.127.246.164, 127.0.0.1],
sockAddrs=[/127.0.0.1:47503,
LNAPLN109PRD.ldn.emea.cib/10.127.246.164:47503], discPort=47503, order=247,
intOrder=127, lastExchangeTime=1495066956338, loc=false,
ver=1.8.0#20161205-sha1:9ca40dbe, isClient=false]
2017-05-19 15:23:07,403 INFO  [disco-event-worker-#20%null%] java.JavaLogger
(JavaLogger.java:273) - Topology snapshot [ver=255, servers=1, clients=0,
CPUs=16, heap=0.44GB]
2017-05-19 15:23:07,540 INFO  [Thread-1622] java.JavaLogger
(JavaLogger.java:273) - Command protocol successfully stopped: TCP binary
class org.apache.ignite.IgniteException: Failed to wait for topology update,
cache (or node) is stopping.
        at
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:927)
        at
org.apache.ignite.internal.processors.datastructures.GridCacheQueueAdapter.iterator(GridCacheQueueAdapter.java:230)
        at java.util.AbstractCollection.contains(Unknown Source)
        at
org.apache.ignite.internal.processors.datastructures.GridCacheQueueProxy.contains(GridCacheQueueProxy.java:170)
        at
com.indosuez.server.crossfxcurve.CrossFxCurvePubService.addToQueue(CrossFxCurvePubService.java:267)
        at
com.indosuez.server.crossfxcurve.CrossFxCurvePubService.checkForUpdate(CrossFxCurvePubService.java:230)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
Source)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to wait
for topology update, cache (or node) is stopping.
        at
org.apache.ignite.internal.processors.cache.GridCacheAffinityManager.cancelFutures(GridCacheAffinityManager.java:92)
        at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:906)
        at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:1931)
        at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:1877)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2363)
        at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2326)
        at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:322)
        at org.apache.ignite.Ignition.stop(Ignition.java:224)
        at
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$8.run(GridDiscoveryManager.java:1949)
        ... 1 more
2017-05-19 15:23:07,709 INFO  [Thread-1622] java.JavaLogger
(JavaLogger.java:273) - <datastructures_0> Cancelled rebalancing from all
nodes [cache=datastructures_0, topology=AffinityTopologyVersion [topVer=252,
minorTopVer=0]]
class org.apache.ignite.IgniteException: Operation has been cancelled (node
is stopping).
        at
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:927)
        at
org.apache.ignite.internal.processors.datastructures.GridCacheQueueAdapter.iterator(GridCacheQueueAdapter.java:230)
        at java.util.AbstractCollection.contains(Unknown Source)
        at
org.apache.ignite.internal.processors.datastructures.GridCacheQueueProxy.contains(GridCacheQueueProxy.java:170)
        at
com.indosuez.server.crossfxcurve.CrossFxCurvePubService.addToQueue(CrossFxCurvePubService.java:267)
        at
com.indosuez.server.crossfxcurve.CrossFxCurvePubService.checkForUpdate(CrossFxCurvePubService.java:230)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
Source)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: class org.apache.ignite.internal.NodeStoppingException: Operation
has been cancelled (node is stopping).
        at
org.apache.ignite.internal.processors.cache.GridCacheMvccManager.stopError(GridCacheMvccManager.java:414)
        at
org.apache.ignite.internal.processors.cache.GridCacheMvccManager.onFutureAdded(GridCacheMvccManager.java:601)
        at
org.apache.ignite.internal.processors.cache.GridCacheMvccManager.addFuture(GridCacheMvccManager.java:486)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.map(GridPartitionedSingleGetFuture.java:270)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:196)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1439)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.get(GridDhtAtomicCache.java:482)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4783)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1395)
        at
org.apache.ignite.internal.processors.datastructures.GridCacheQueueAdapter.iterator(GridCacheQueueAdapter.java:223)
        ... 11 more
java.lang.IllegalStateException: Cache has been stopped: datastructures_0
        at
org.apache.ignite.internal.processors.cache.GridCacheGateway.checkState(GridCacheGateway.java:85)
        at
org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:68)
        at
org.apache.ignite.internal.processors.datastructures.GridCacheQueueProxy.contains(GridCacheQueueProxy.java:160)
        at
com.indosuez.server.crossfxcurve.CrossFxCurvePubService.addToQueue(CrossFxCurvePubService.java:267)
        at
com.indosuez.server.crossfxcurve.CrossFxCurvePubService.checkForUpdate(CrossFxCurvePubService.java:230)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
Source)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
2017-05-19 15:23:07,773 INFO  [Thread-1622] java.JavaLogger
(JavaLogger.java:273) - Stopped cache: ignite-marshaller-sys-cache
2017-05-19 15:23:07,774 INFO  [Thread-1622] java.JavaLogger
(JavaLogger.java:273) - Stopped cache: ignite-sys-cache
2017-05-19 15:23:07,775 INFO  [Thread-1622] java.JavaLogger
(JavaLogger.java:273) - Stopped cache: ignite-atomics-sys-cache
2017-05-19 15:23:07,776 INFO  [Thread-1622] java.JavaLogger
(JavaLogger.java:273) - Stopped cache: datastructures_0
2017-05-19 15:23:07,785 INFO  [Thread-1622] java.JavaLogger
(JavaLogger.java:273) - Removed undeployed class: GridDeployment
[ts=1495066953881, depMode=SHARED,
clsLdr=sun.misc.Launcher$AppClassLoader@3b7951,
clsLdrId=b371fe81c51-bc2a043b-b90a-45d9-9761-962b2e8b0404, userVer=0,
loc=true,
sampleClsName=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap,
pendingUndeploy=false, undeployed=true, usage=0]
2017-05-19 15:23:07,835 INFO  [Thread-1622] java.JavaLogger
(JavaLogger.java:273) - 
>>> +---------------------------------------------------------------------------------+
>>> Ignite ver. 1.8.0#20161205-sha1:9ca40dbeb7d559fcb299bdb6f5c90cdf8ce7e533
>>> stopped OK
>>> +---------------------------------------------------------------------------------+
>>> Grid uptime: 38:00:32:899
java.lang.IllegalStateException: Cache has been stopped: datastructures_0
        at
org.apache.ignite.internal.processors.cache.GridCacheGateway.checkState(GridCacheGateway.java:85)
        at
org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:68)
        at
org.apache.ignite.internal.processors.datastructures.GridCacheQueueProxy.contains(GridCacheQueueProxy.java:160)
        at
com.indosuez.server.crossfxcurve.CrossFxCurvePubService.addToQueue(CrossFxCurvePubService.java:267)
        at
com.indosuez.server.crossfxcurve.CrossFxCurvePubService.checkForUpdate(CrossFxCurvePubService.java:230)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
Source)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)






--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Cache-has-been-stopped-tp13056.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to