Hello! It seems that you have network problems.
It's possible that you have more than one network/interface and some combinations are causing problems. Please try to specify localHost property on every node pointing to a current actual external IP address of the node. Regards, -- Ilya Kasnacheev пт, 15 янв. 2021 г. в 16:19, VeenaMithare <v.mith...@cmcmarkets.com>: > Hello, > > We see this behaviour in our client startup : > Client 1 - > > Client1.txt > <http://apache-ignite-users.70518.x6.nabble.com/file/t2757/Client1.txt> > Server003 log - > > SERVER3.TXT > <http://apache-ignite-users.70518.x6.nabble.com/file/t2757/SERVER3.TXT> > > 1. The client join the cluster around (2021-01-14T16:16:29) in both > client1.log and server003.log. This started the partition map exchange . > 2. The client1.log shows BLOCKED SYSTEM CRITICAL THREAD : > 2021-01-14T16:16:44,472 ERROR o.a.i.i.u.t.G > [grid-timeout-worker-#21%InstanceName%]: Blocked system-critical thread has > been detected. This can lead to cluster-wide undefined behaviour > [workerName=partition-exchanger, > threadName=exchange-worker-#37%InstanceName%, blockedFor=14s] > > After this , we see the connection timesout to machinename003. > 021-01-14T16:16:45,017 WARN o.a.i.s.c.t.TcpCommunicationSpi > [exchange-worker-#37%InstanceName%]: Connection timed out (will stop > attempts to perform the connect) > [node=7b9d8c6f-814c-4cb6-9822-8fa3d7f79eb7, > connTimeoutStgy=ExponentialBackoffTimeoutStrategy [maxTimeout=10000, > totalTimeout=15000, startNanos=14942834332684124, currTimeout=10000], > failureDetectionTimeoutEnabled=false, timeout=9938, err=null, > addr=/m.n.o.202:47130]* > > > > 3. If I look in to the logs in machinename003, the partition map exchange > finished in 8 seconds. > > > 2021-01-14T16:16:29,671 INFO > o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture [exchange-worker-#566]: > Exchange timings [startVer=AffinityTopologyVersion [topVer=47, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=47, minorTopVer=0], > stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters > initialization" (0 ms), stage="Determine exchange type" (4 ms), > stage="Exchange done" (4 ms), stage="Total time" (8 ms)] > > If so, why was the partition-exchanger blocked on the client ? > > 4. Inspite of showing connection timeout however, it manages to > successfully > connect to machinename003. ( Please note that m.n.o.202, x.y.z.202 are ip > addresses of the same server machinename003 ). > > 2021-01-14T16:16:45,026 INFO o.a.i.s.c.t.TcpCommunicationSpi > [grid-nio-worker-tcp-comm-0-#22%InstanceName%]: Established outgoing > communication connection [locAddr=/a.b.c.21:53607, > rmtAddr=machinename003.cmc.local/x.y.z.202:47130] > > > > Kindly guide us what happened here.. > ========================= > 5. Also we have configured TcpCommunicationSpi timeouts as below as per the > recommendation given in : > > > http://apache-ignite-users.70518.x6.nabble.com/IgniteSpiOperationTimeoutException-Operation-timed-out-timeoutStrategy-ExponentialBackoffTimeoutStray-tp34196p34377.html > > <bean > class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi" > scope="prototype"> > <property name="connectTimeout" value="5000"/> > <property name="maxConnectTimeout" value="10000"/> > > Is the timeout observed because of this setting ? > =============================== > > 6. Our TxTimeoutOnPartitionMapExchange is 1 second > <property name="TxTimeoutOnPartitionMapExchange" > value="1000"/> > > What is the ideal TxTimeoutOnPartitionMapExchange value that should be > given > should it be something like 50 milliseconds ? > > ===================================== > A similar log captured during client 2 startup attached as well. > Client2.txt > <http://apache-ignite-users.70518.x6.nabble.com/file/t2757/Client2.txt> > > regards, > Veena. > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >