RE: Ignite node is down

Ilya Korol Wed, 24 Aug 2022 22:34:36 -0700

Hi,

According to logs you have the following:

[17:02:04,774][INFO][grid-nio-worker-tcp-comm-0-#40%MATCHERWORKER%][TcpCommunicationSpi]Accepted incoming communication connection[locAddr=/xx.xx.xxx.IP2:47100, rmtAddr=/xx.xx.xxx.IP1:52166]*[18:54:12,125][SEVERE][exchange-worker-#65%MATCHERWORKER%][G] Blockedsystem-critical thread has been detected. This can lead to cluster-wideundefined behaviour [workerName=grid-nio-worker-tcp-comm-0,threadName=grid-nio-worker-tcp-comm-0-#40%MATCHERWORKER%, blockedFor=10s]*[18:54:14,059][WARNING][exchange-worker-#65%MATCHERWORKER%][G] Thread[name="grid-nio-worker-tcp-comm-0-#40%MATCHERWORKER%", id=63,state=RUNNABLE, blockCnt=0, waitCnt=0]

[18:54:14,062][INFO][tcp-disco-sock-reader-[d6d591bdxx.xx.xxx.IP4:48633]-#8%MATCHERWORKER%][TcpDiscoverySpi] Finishedserving remote node connection [rmtAddr=/xx.xx.xxx.IP4:48633, rmtPort=48633[18:54:14,067][INFO][tcp-disco-srvr-[:47500]-#3%MATCHERWORKER%][TcpDiscoverySpi]TCP discovery accepted incoming connection [rmtAddr=/xx.xx.xxx.IP3,rmtPort=39939][18:54:14,067][INFO][tcp-disco-srvr-[:47500]-#3%MATCHERWORKER%][TcpDiscoverySpi]TCP discovery spawning a new thread for connection[rmtAddr=/xx.xx.xxx.IP3, rmtPort=39939][18:54:14,068][INFO][tcp-disco-sock-reader-[]-#11%MATCHERWORKER%][TcpDiscoverySpi]Started serving remote node connection [rmtAddr=/xx.xx.xxx.IP3:39939,rmtPort=39939][18:54:14,072][INFO][tcp-disco-sock-reader-[439d63a3xx.xx.xxx.IP3:39939]-#11%MATCHERWORKER%][TcpDiscoverySpi] Initializedconnection with remote server node[nodeId=439d63a3-4d24-4e53-9507-581ce7e2347e, rmtAddr=/xx.xx.xxx.IP3:39939][18:54:14,075][INFO][tcp-disco-sock-reader-[439d63a3xx.xx.xxx.IP3:39939]-#11%MATCHERWORKER%][TcpDiscoverySpi] Finishedserving remote node connection [rmtAddr=/xx.xx.xxx.IP3:39939, rmtPort=39939[18:54:14,097][WARNING][exchange-worker-#65%MATCHERWORKER%][] Possiblefailure suppressed accordingly to a configured handler[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=classo.a.i.IgniteException: GridWorker [name=grid-nio-worker-tcp-comm-0,igniteInstanceName=MATCHERWORKER, finished=false,heartbeatTs=1660915454055]]]class org.apache.ignite.IgniteException: GridWorker[name=grid-nio-worker-tcp-comm-0, igniteInstanceName=MATCHERWORKER,finished=false, heartbeatTs=1660915454055] atorg.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1810) atorg.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1805) atorg.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234) atorg.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297) atorg.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3096) atorg.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063) atorg.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)

    at java.base/java.lang.Thread.run(Thread.java:829)

It would be great to have a thread-dump for corresponding time, to seewhat kind of work did *grid-nio-worker-tcp-comm-0-#40%MATCHERWORKER%*perform. (Actually its better to have logs and thread-dumps forcorresponding time from whole cluster).You mentioned that this is one of the worker nodes, what does it mean?Do you run some specific tasks there, like IgniteCompute tasks, orsomething like this?

Meanwhile you can also try to update your setup, to modern versions like2.13.




On  2022/08/25 05:04:43 BEELA GAYATRI via user wrote:
> TCS Confidential
>
> Dear Team,
>
>

> We have 4 worker nodes in baseline topology .Every time one of thenodes is getting down with message " Node is out of topology (probably,due to short-time network problems)". Ignite log and configuration isattached in the mail.

> Here are few queries
>
> 1. Why every time the one of the node is getting down with that error
>

> 1. If the node is getting down , is there any way the node can bestarted automatically without manual intervention??

> 2.
>
> Thanks & Regards,
> Gayatri Beela
>
>
> TCS Confidential
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>

RE: Ignite node is down

Reply via email to