Hi there,
When I tried to clear one specific cache, service nodes closed unexpected.
visor> cache -clear -c=@c7
[17:26:35] Topology snapshot [ver=62, servers=9, clients=0, CPUs=16,
heap=63.0GB]
[17:26:38,009][SEVERE][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi]
TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in
order to prevent cluster wide instability.
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
at
java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:519)
at
java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:682)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5779)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2161)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[17:26:38] Topology snapshot [ver=71, servers=1, clients=0, CPUs=8, heap=7.0GB]
[17:26:43] Topology snapshot [ver=62, servers=9, clients=0, CPUs=16,
heap=63.0GB]
[17:26:43] Topology snapshot [ver=62, servers=9, clients=0, CPUs=16,
heap=63.0GB]
[17:26:44] Topology snapshot [ver=62, servers=9, clients=0, CPUs=16,
heap=63.0GB]
[17:27:19] Topology snapshot [ver=63, servers=8, clients=0, CPUs=16,
heap=56.0GB]
[17:27:19] Topology snapshot [ver=63, servers=8, clients=0, CPUs=16,
heap=56.0GB]
[17:27:19] Topology snapshot [ver=64, servers=7, clients=0, CPUs=16,
heap=49.0GB]
[17:27:19] Topology snapshot [ver=64, servers=7, clients=0, CPUs=16,
heap=49.0GB]
[17:27:19] Topology snapshot [ver=65, servers=6, clients=0, CPUs=16,
heap=42.0GB]
[17:27:19] Topology snapshot [ver=65, servers=6, clients=0, CPUs=16,
heap=42.0GB]
[17:27:19] Topology snapshot [ver=67, servers=5, clients=0, CPUs=16,
heap=35.0GB]
[17:27:19] Topology snapshot [ver=67, servers=4, clients=0, CPUs=16,
heap=28.0GB]
[17:27:19] Topology snapshot [ver=67, servers=5, clients=0, CPUs=16,
heap=35.0GB]
[17:27:19] Topology snapshot [ver=67, servers=4, clients=0, CPUs=16,
heap=28.0GB]
[17:27:23,326][SEVERE][sys-#19%null%][GridCachePartitionExchangeManager] Failed
to send local partition map to node [node=TcpDiscoveryNode
[id=b247699c-8545-40a4-9b9c-aa478ea3ca55, addrs=[0:0:0:0:0:0:0:1%lo,
10.120.70.122, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47501,
/0:0:0:0:0:0:0:1%lo:47501, /10.120.70.122:47501, /127.0.0.1:47501],
discPort=47501, order=2, intOrder=2, lastExchangeTime=1461673945474, loc=false,
ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false],
exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=62,
minorTopVer=0], nodeId=fe76324d, evt=NODE_FAILED]]
class org.apache.ignite.IgniteCheckedException: Failed to send message (node
may have left the grid or TCP connection cannot be established due to firewall
issues) [node=TcpDiscoveryNode [id=b247699c-8545-40a4-9b9c-aa478ea3ca55,
addrs=[0:0:0:0:0:0:0:1%lo, 10.120.70.122, 127.0.0.1],
sockAddrs=[/0:0:0:0:0:0:0:1%lo:47501, /0:0:0:0:0:0:0:1%lo:47501,
/10.120.70.122:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2,
lastExchangeTime=1461673945474, loc=false, ver=1.5.0#20151229-sha1:f1f8cda2,
isClient=false], topic=TOPIC_CACHE, msg=GridDhtPartitionsSingleMessage
[parts={1=GridDhtPartitionMap2 [moving=12, size=121],
-2146922738=GridDhtPartitionMap2 [moving=12, size=121],
745661760=GridDhtPartitionMap2 [moving=12, size=121],
-2100569601=GridDhtPartitionMap2 [moving=0, size=100],
-1071296927=GridDhtPartitionMap2 [moving=12, size=121],
-1667118441=GridDhtPartitionMap2 [moving=12, size=121],
689859866=GridDhtPartitionMap2 [moving=12, size=121],
810756007=GridDhtPartitionMap2 [moving=12, size=121],
-1582327725=GridDhtPartitionMap2 [moving=12, size=121],
1316949047=GridDhtPartitionMap2 [moving=12, size=121],
1325947219=GridDhtPartitionMap2 [moving=0, size=20]}, partCntrs=null,
client=false, super=GridDhtPartitionsAbstractMessage
[exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=62,
minorTopVer=0], nodeId=fe76324d, evt=NODE_FAILED], lastVer=GridCacheVersion
[topVer=73153881, nodeOrderDrId=8, globalTime=1461749161426,
order=1461727649806], super=GridCacheMessage [msgId=12782, depInfo=null,
err=null, skipPrepare=false, cacheId=0, cacheId=0]]], policy=2]
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1082)
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1146)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.sendNoRetry(GridCacheIoManager.java:873)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.sendLocalPartitions(GridCachePartitionExchangeManager.java:814)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionRequest(GridCachePartitionExchangeManager.java:1087)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1100(GridCachePartitionExchangeManager.java:107)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$4.onMessage(GridCachePartitionExchangeManager.java:291)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$4.onMessage(GridCachePartitionExchangeManager.java:289)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:1635)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:1617)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:582)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:280)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:204)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:80)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:163)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:821)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1600(GridIoManager.java:103)
at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:784)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send
message to remote node: TcpDiscoveryNode
[id=b247699c-8545-40a4-9b9c-aa478ea3ca55, addrs=[0:0:0:0:0:0:0:1%lo,
10.120.70.122, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47501,
/0:0:0:0:0:0:0:1%lo:47501, /10.120.70.122:47501, /127.0.0.1:47501],
discPort=47501, order=2, intOrder=2, lastExchangeTime=1461673945474, loc=false,
ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1959)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1899)
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1077)
... 20 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to
node (is node still alive?). Make sure that each GridComputeTask and
GridCacheTransaction has a timeout set in order to prevent parties from waiting
forever in case of network issues [nodeId=b247699c-8545-40a4-9b9c-aa478ea3ca55,
addrs=[/10.120.70.122:47101, /0:0:0:0:0:0:0:1%lo:47101, /127.0.0.1:47101]]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2462)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2103)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:1997)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1933)
... 22 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: /10.120.70.122:47101
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467)
... 25 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to
read remote node recovery handshake (connection closed).
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2672)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334)
... 25 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: /0:0:0:0:0:0:0:1%lo:47101
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467)
... 25 more
Caused by: class org.apache.ignite.IgniteCheckedException: Remote node
ID is not as expected [expected=b247699c-8545-40a4-9b9c-aa478ea3ca55,
rcvd=58708335-cd7e-4e54-b86a-73a63da9ed4d]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2577)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334)
... 25 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: /127.0.0.1:47101
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467)
... 25 more
Caused by: class org.apache.ignite.IgniteCheckedException: Remote node
ID is not as expected [expected=b247699c-8545-40a4-9b9c-aa478ea3ca55,
rcvd=58708335-cd7e-4e54-b86a-73a63da9ed4d]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2577)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334)
... 25 more
[17:27:27] Topology snapshot [ver=71, servers=1, clients=0, CPUs=8, heap=7.0GB]
[17:27:28,403][SEVERE][exchange-worker-#50%null%][GridCachePartitionExchangeManager]
Failed to send local partition map to node [node=TcpDiscoveryNode
[id=98ba7f1a-2815-4a05-b083-420066840ce5, addrs=[0:0:0:0:0:0:0:1%lo,
10.120.70.122, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500,
/0:0:0:0:0:0:0:1%lo:47500, /10.120.70.122:47500, /127.0.0.1:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1461673938529, loc=false,
ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false], exchId=null]
class org.apache.ignite.IgniteCheckedException: Failed to send message (node
may have left the grid or TCP connection cannot be established due to firewall
issues) [node=TcpDiscoveryNode [id=98ba7f1a-2815-4a05-b083-420066840ce5,
addrs=[0:0:0:0:0:0:0:1%lo, 10.120.70.122, 127.0.0.1],
sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /0:0:0:0:0:0:0:1%lo:47500,
/10.120.70.122:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
lastExchangeTime=1461673938529, loc=false, ver=1.5.0#20151229-sha1:f1f8cda2,
isClient=false], topic=TOPIC_CACHE, msg=GridDhtPartitionsSingleMessage
[parts={1=GridDhtPartitionMap2 [moving=0, size=111],
-2146922738=GridDhtPartitionMap2 [moving=0, size=111],
745661760=GridDhtPartitionMap2 [moving=0, size=111],
-2100569601=GridDhtPartitionMap2 [moving=0, size=100],
-1071296927=GridDhtPartitionMap2 [moving=0, size=111],
-1667118441=GridDhtPartitionMap2 [moving=0, size=111],
689859866=GridDhtPartitionMap2 [moving=0, size=111],
810756007=GridDhtPartitionMap2 [moving=0, size=111],
-1582327725=GridDhtPartitionMap2 [moving=0, size=111],
1316949047=GridDhtPartitionMap2 [moving=0, size=111],
1325947219=GridDhtPartitionMap2 [moving=0, size=20]}, partCntrs=null,
client=false, super=GridDhtPartitionsAbstractMessage [exchId=null,
lastVer=GridCacheVersion [topVer=73153881, nodeOrderDrId=7,
globalTime=1461749161553, order=1461727649806], super=GridCacheMessage
[msgId=12784, depInfo=null, err=null, skipPrepare=false, cacheId=0,
cacheId=0]]], policy=2]
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1082)
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1146)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.sendNoRetry(GridCacheIoManager.java:873)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.sendLocalPartitions(GridCachePartitionExchangeManager.java:814)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:705)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:724)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1600(GridCachePartitionExchangeManager.java:107)
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1267)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.spi.IgniteSpiException: Failed to send
message to remote node: TcpDiscoveryNode
[id=98ba7f1a-2815-4a05-b083-420066840ce5, addrs=[0:0:0:0:0:0:0:1%lo,
10.120.70.122, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500,
/0:0:0:0:0:0:0:1%lo:47500, /10.120.70.122:47500, /127.0.0.1:47500],
discPort=47500, order=1, intOrder=1, lastExchangeTime=1461673938529, loc=false,
ver=1.5.0#20151229-sha1:f1f8cda2, isClient=false]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1959)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1899)
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1077)
... 9 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to connect to
node (is node still alive?). Make sure that each GridComputeTask and
GridCacheTransaction has a timeout set in order to prevent parties from waiting
forever in case of network issues [nodeId=98ba7f1a-2815-4a05-b083-420066840ce5,
addrs=[/10.120.70.122:47100, /0:0:0:0:0:0:0:1%lo:47100, /127.0.0.1:47100]]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2462)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2103)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:1997)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1933)
... 11 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: /10.120.70.122:47100
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467)
... 14 more
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to
read remote node recovery handshake (connection closed).
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2672)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334)
... 14 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: /0:0:0:0:0:0:0:1%lo:47100
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467)
... 14 more
Caused by: class org.apache.ignite.IgniteCheckedException: Remote node
ID is not as expected [expected=98ba7f1a-2815-4a05-b083-420066840ce5,
rcvd=8cbb2885-4f9f-4547-8b2b-b55e64cb3579]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2577)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334)
... 14 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to
connect to address: /127.0.0.1:47100
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2467)
... 14 more
Caused by: class org.apache.ignite.IgniteCheckedException: Remote node
ID is not as expected [expected=98ba7f1a-2815-4a05-b083-420066840ce5,
rcvd=8cbb2885-4f9f-4547-8b2b-b55e64cb3579]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2577)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2334)
... 14 more