Hello,

We see this behaviour in our client startup :
Client 1 -

Client1.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t2757/Client1.txt>  
Server003 log -

SERVER3.TXT
<http://apache-ignite-users.70518.x6.nabble.com/file/t2757/SERVER3.TXT>  

1. The client join the cluster around (2021-01-14T16:16:29) in both
client1.log and server003.log. This started the partition map exchange . 
2. The client1.log shows BLOCKED SYSTEM CRITICAL THREAD : 
2021-01-14T16:16:44,472 ERROR o.a.i.i.u.t.G
[grid-timeout-worker-#21%InstanceName%]: Blocked system-critical thread has
been detected. This can lead to cluster-wide undefined behaviour
[workerName=partition-exchanger,
threadName=exchange-worker-#37%InstanceName%, blockedFor=14s]

After this , we see the connection timesout  to machinename003. 
021-01-14T16:16:45,017 WARN  o.a.i.s.c.t.TcpCommunicationSpi
[exchange-worker-#37%InstanceName%]: Connection timed out (will stop
attempts to perform the connect) [node=7b9d8c6f-814c-4cb6-9822-8fa3d7f79eb7,
connTimeoutStgy=ExponentialBackoffTimeoutStrategy [maxTimeout=10000,
totalTimeout=15000, startNanos=14942834332684124, currTimeout=10000],
failureDetectionTimeoutEnabled=false, timeout=9938, err=null,
addr=/m.n.o.202:47130]*



3. If I look in to the logs in machinename003, the partition map exchange
finished in 8 seconds. 


2021-01-14T16:16:29,671 INFO 
o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture [exchange-worker-#566]:
Exchange timings [startVer=AffinityTopologyVersion [topVer=47,
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=47, minorTopVer=0],
stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters
initialization" (0 ms), stage="Determine exchange type" (4 ms),
stage="Exchange done" (4 ms), stage="Total time" (8 ms)]

If so, why was the partition-exchanger blocked on the client ?

4. Inspite of showing connection timeout however, it manages to successfully
connect to machinename003. ( Please note that m.n.o.202, x.y.z.202 are ip
addresses of the same server machinename003 ). 

2021-01-14T16:16:45,026 INFO  o.a.i.s.c.t.TcpCommunicationSpi
[grid-nio-worker-tcp-comm-0-#22%InstanceName%]: Established outgoing
communication connection [locAddr=/a.b.c.21:53607,
rmtAddr=machinename003.cmc.local/x.y.z.202:47130]



Kindly guide us what happened here.. 
=========================
5. Also we have configured TcpCommunicationSpi timeouts as below as per the
recommendation given in : 

http://apache-ignite-users.70518.x6.nabble.com/IgniteSpiOperationTimeoutException-Operation-timed-out-timeoutStrategy-ExponentialBackoffTimeoutStray-tp34196p34377.html

            <bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"
scope="prototype">
                <property name="connectTimeout" value="5000"/>
                <property name="maxConnectTimeout" value="10000"/>
  
Is the timeout observed because of this setting ?
===============================

6. Our TxTimeoutOnPartitionMapExchange is 1 second
                <property name="TxTimeoutOnPartitionMapExchange"
value="1000"/>

What is the ideal TxTimeoutOnPartitionMapExchange value that should be given
should it be something like 50 milliseconds ?

=====================================
A similar log captured during client 2 startup attached as well. 
Client2.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t2757/Client2.txt>  

regards,
Veena.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to