>> But data exchange and node discovery are taking place in different SPI.
I just worry about if the discovery thread has the high enough priority to
finish the join process first when the communication threads are also very
busy.

So when a new server node is joining the topology, after the coordinator
adds it to its local NodeRing, it begins to do the partition re-balance and
doesn't wait for all the server nodes to confirm the join process is
done(coordinator receives the NodeAddFinishedMessage again), right?

would you like to share some design doc or some diagram for the discovery
and communication SPI during the join process.

>> Can you provide log files from all nodes?
Will try to provide, but there're too many logs. Just because I turn on the
DEBUG for Discovery & Communication SPI. Or any suggestion on which module
to be turned on "DEBUG"?

BTW, would you like to suggest some keywords in the logs so that I can
extract some of them to ease your debugging?

Thanks,
-Jason



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Fail-to-join-topology-and-repeat-join-process-tp6987p7084.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to