Hello, We see this behaviour in our client startup : Client 1 -
Client1.txt <http://apache-ignite-users.70518.x6.nabble.com/file/t2757/Client1.txt> Server003 log - SERVER3.TXT <http://apache-ignite-users.70518.x6.nabble.com/file/t2757/SERVER3.TXT> 1. The client join the cluster around (2021-01-14T16:16:29) in both client1.log and server003.log. This started the partition map exchange . 2. The client1.log shows BLOCKED SYSTEM CRITICAL THREAD : 2021-01-14T16:16:44,472 ERROR o.a.i.i.u.t.G [grid-timeout-worker-#21%InstanceName%]: Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=partition-exchanger, threadName=exchange-worker-#37%InstanceName%, blockedFor=14s] After this , we see the connection timesout to machinename003. 021-01-14T16:16:45,017 WARN o.a.i.s.c.t.TcpCommunicationSpi [exchange-worker-#37%InstanceName%]: Connection timed out (will stop attempts to perform the connect) [node=7b9d8c6f-814c-4cb6-9822-8fa3d7f79eb7, connTimeoutStgy=ExponentialBackoffTimeoutStrategy [maxTimeout=10000, totalTimeout=15000, startNanos=14942834332684124, currTimeout=10000], failureDetectionTimeoutEnabled=false, timeout=9938, err=null, addr=/m.n.o.202:47130]* 3. If I look in to the logs in machinename003, the partition map exchange finished in 8 seconds. 2021-01-14T16:16:29,671 INFO o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture [exchange-worker-#566]: Exchange timings [startVer=AffinityTopologyVersion [topVer=47, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=47, minorTopVer=0], stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters initialization" (0 ms), stage="Determine exchange type" (4 ms), stage="Exchange done" (4 ms), stage="Total time" (8 ms)] If so, why was the partition-exchanger blocked on the client ? 4. Inspite of showing connection timeout however, it manages to successfully connect to machinename003. ( Please note that m.n.o.202, x.y.z.202 are ip addresses of the same server machinename003 ). 2021-01-14T16:16:45,026 INFO o.a.i.s.c.t.TcpCommunicationSpi [grid-nio-worker-tcp-comm-0-#22%InstanceName%]: Established outgoing communication connection [locAddr=/a.b.c.21:53607, rmtAddr=machinename003.cmc.local/x.y.z.202:47130] Kindly guide us what happened here.. ========================= 5. Also we have configured TcpCommunicationSpi timeouts as below as per the recommendation given in : http://apache-ignite-users.70518.x6.nabble.com/IgniteSpiOperationTimeoutException-Operation-timed-out-timeoutStrategy-ExponentialBackoffTimeoutStray-tp34196p34377.html <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi" scope="prototype"> <property name="connectTimeout" value="5000"/> <property name="maxConnectTimeout" value="10000"/> Is the timeout observed because of this setting ? =============================== 6. Our TxTimeoutOnPartitionMapExchange is 1 second <property name="TxTimeoutOnPartitionMapExchange" value="1000"/> What is the ideal TxTimeoutOnPartitionMapExchange value that should be given should it be something like 50 milliseconds ? ===================================== A similar log captured during client 2 startup attached as well. Client2.txt <http://apache-ignite-users.70518.x6.nabble.com/file/t2757/Client2.txt> regards, Veena. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/