Hello!

Looks like network problems, long GC on server node or some kind of
deadlock on server node which prevents it from responding.

Regards,
-- 
Ilya Kasnacheev


ср, 24 февр. 2021 г. в 13:09, oguzhan <[email protected]>:

> Hello,
>
> We have 1 client node and 1 server node and we are using ignite version
> 2.9.1.
>
> Our application is scheduled to do the same jobs every day. Then our
> application did not get any errors for 2 weeks, but 2 weeks later, we are
> getting this error as you can see below (We get such an error about every 2
> weeks):
>
> I hope you support to solve my problem. Thanks and best regards...
>
>
> 2021-02-14 02:07:34 WARN  tcp-client-disco-reconnector-#7-#77756
> TcpDiscoverySpi:576 - Failed to connect to any address from IP finder (will
> retry to join topology every 2000 ms; change 'reconnectDelay' to configure
> the frequency of retries): [/127.0.0.1:47500, /127.0.0.1:47501,
> /127.0.0.1:47502, /127.0.0.1:47503, /127.0.0.1:47504, /127.0.0.1:47505,
> /127.0.0.1:47506, /127.0.0.1:47507, /127.0.0.1:47508, /127.0.0.1:47509]
> 2021-02-14 02:07:37 INFO  grid-timeout-worker-#206 IgniteKernal:566 -
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>     ^-- Node [id=2fefd66f, uptime=4 days, 13:33:34.341]
>     ^-- Cluster [hosts=1, CPUs=16, servers=1, clients=1, topVer=2,
> minorTopVer=18985]
>     ^-- Network [addrs=[10.86.26.180, 127.0.0.1], discoPort=0,
> commPort=47101]
>     ^-- CPU [CPUs=16, curLoad=1.07%, avgLoad=0.05%, GC=0.1%]
>     ^-- Heap [used=865MB, free=92.96%, comm=12274MB]
>     ^-- Off-heap memory [used=0MB, free=100%, allocated=0MB]
>     ^-- Page memory [pages=0]
>     ^--   sysMemPlc region [type=internal, persistence=false,
> lazyAlloc=false,
>       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%,
> allocRam=0MB]
>     ^--   TxLog region [type=internal, persistence=false, lazyAlloc=false,
>       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%,
> allocRam=0MB]
>     ^--   Default_Region region [type=default, persistence=false,
> lazyAlloc=true,
>       ...  initCfg=256MB, maxCfg=32768MB, usedRam=0MB, freeRam=100%,
> allocRam=0MB]
>     ^-- Outbound messages queue [size=0]
>     ^-- Public thread pool [active=0, idle=0, qSize=0]
>     ^-- System thread pool [active=0, idle=81, qSize=0]
> 2021-02-14 02:07:38 ERROR tcp-client-disco-sock-writer-#2-#230
> TcpDiscoverySpi:586 - Failed to send message: null
> java.io.IOException: Failed to get acknowledge for message:
> TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage
> [sndNodeId=null, id=1d467368771-2fefd66f-0954-45dd-aa32-a33e58567950,
> verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null,
> isClient=true]]
>         at
>
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1471)
>         at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
> 2021-02-14 02:07:44 WARN  tcp-comm-worker-#1-#216 TcpCommunicationSpi:576 -
> Handshake timed out (will stop attempts to perform the handshake)
> [node=6953d599-d606-4781-a6ba-43de7aff59e4,
> connTimeoutStrategy=ExponentialBackoffTimeoutStrategy [maxTimeout=600000,
> totalTimeout=10000, startNanos=1671033974906026, currTimeout=600000],
> err=Operation timed out [timeoutStrategy= ExponentialBackoffTimeoutStrategy
> [maxTimeout=600000, totalTimeout=10000, startNanos=1671033974906026,
> currTimeout=600000]], addr=/127.0.0.1:47100,
> failureDetectionTimeoutEnabled=true, timeout=0]
> 2021-02-14 02:07:54 WARN  tcp-comm-worker-#1-#216 TcpCommunicationSpi:576 -
> Handshake timed out (will stop attempts to perform the handshake)
> [node=6953d599-d606-4781-a6ba-43de7aff59e4,
> connTimeoutStrategy=ExponentialBackoffTimeoutStrategy [maxTimeout=600000,
> totalTimeout=10000, startNanos=1671044002786218, currTimeout=600000],
> err=Operation timed out [timeoutStrategy= ExponentialBackoffTimeoutStrategy
> [maxTimeout=600000, totalTimeout=10000, startNanos=1671044002786218,
> currTimeout=600000]], addr=dwccatp01/10.86.26.180:47100,
> failureDetectionTimeoutEnabled=true, timeout=0]
> 2021-02-14 02:08:06 ERROR grid-timeout-worker-#206 G:581 - Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [workerName=tcp-comm-worker,
> threadName=tcp-comm-worker-#1-#216, blockedFor=11s]
> 2021-02-14 02:08:06 WARN  grid-timeout-worker-#206 root:576 - Possible
> failure suppressed accordingly to a configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]
>         at sun.misc.Unsafe.park(Native Method)
>         at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449)
>         at
>
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493)
>         at
>
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688)
>         at
>
> org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503)
>         at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
> [02:08:06] Possible failure suppressed accordingly to a configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]]
> 2021-02-14 02:08:07 WARN  grid-timeout-worker-#206
> CacheDiagnosticManager:571 - Page locks dump:
>
>
> 2021-02-14 02:08:16 ERROR grid-timeout-worker-#206 G:581 - Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [workerName=tcp-comm-worker,
> threadName=tcp-comm-worker-#1-#216, blockedFor=21s]
> 2021-02-14 02:08:16 WARN  grid-timeout-worker-#206 root:576 - Possible
> failure suppressed accordingly to a configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]
>         at sun.misc.Unsafe.park(Native Method)
>         at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449)
>         at
>
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493)
>         at
>
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688)
>         at
>
> org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503)
>         at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
> [02:08:16] Possible failure suppressed accordingly to a configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]]
> 2021-02-14 02:08:16 WARN  grid-timeout-worker-#206
> CacheDiagnosticManager:571 - Page locks dump:
>
>
> 2021-02-14 02:08:28 ERROR grid-timeout-worker-#206 G:581 - Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [workerName=tcp-comm-worker,
> threadName=tcp-comm-worker-#1-#216, blockedFor=33s]
> 2021-02-14 02:08:28 WARN  grid-timeout-worker-#206 root:576 - Possible
> failure suppressed accordingly to a configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]
>         at sun.misc.Unsafe.park(Native Method)
>         at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449)
>         at
>
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493)
>         at
>
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688)
>         at
>
> org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503)
>         at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
> [02:08:28] Possible failure suppressed accordingly to a configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]]
> 2021-02-14 02:08:28 WARN  grid-timeout-worker-#206
> CacheDiagnosticManager:571 - Page locks dump:
>
>
> 2021-02-14 02:08:32 WARN  http-nio-8082-exec-5 TcpCommunicationSpi:576 -
> Handshake timed out (will stop attempts to perform the handshake)
> [node=6953d599-d606-4781-a6ba-43de7aff59e4,
> connTimeoutStrategy=ExponentialBackoffTimeoutStrategy [maxTimeout=600000,
> totalTimeout=10000, startNanos=1671081715938786, currTimeout=600000],
> err=Operation timed out [timeoutStrategy= ExponentialBackoffTimeoutStrategy
> [maxTimeout=600000, totalTimeout=10000, startNanos=1671081715938786,
> currTimeout=600000]], addr=/127.0.0.1:47100,
> failureDetectionTimeoutEnabled=true, timeout=0]
> 2021-02-14 02:08:37 ERROR grid-timeout-worker-#206 G:581 - Blocked
> system-critical thread has been detected. This can lead to cluster-wide
> undefined behaviour [workerName=tcp-comm-worker,
> threadName=tcp-comm-worker-#1-#216, blockedFor=42s]
> 2021-02-14 02:08:37 WARN  grid-timeout-worker-#206 root:576 - Possible
> failure suppressed accordingly to a configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]]
> class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]
>         at sun.misc.Unsafe.park(Native Method)
>         at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
>         at
>
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
>         at
>
> org.apache.ignite.spi.discovery.tcp.ClientImpl.pingNode(ClientImpl.java:449)
>         at
>
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.pingNode(TcpDiscoverySpi.java:493)
>         at
>
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.pingNode(GridDiscoveryManager.java:1688)
>         at
>
> org.apache.ignite.internal.managers.GridManagerAdapter$1.pingNode(GridManagerAdapter.java:409)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.processDisconnect(TcpCommunicationSpi.java:5165)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$CommunicationWorker.body(TcpCommunicationSpi.java:4951)
>         at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>         at
>
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$5.body(TcpCommunicationSpi.java:2503)
>         at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
> [02:08:37] Possible failure suppressed accordingly to a configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
> o.a.i.IgniteException: GridWorker [name=tcp-comm-worker,
> igniteInstanceName=null, finished=false, heartbeatTs=1613257674823]]]
> 2021-02-14 02:08:37 WARN  grid-timeout-worker-#206
> CacheDiagnosticManager:571 - Page locks dump:
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Reply via email to