Hi, Most possible that on the of the nodes you have hanged transaction/future/lock or even a deadlock, that's why new nodes can't join cluster - they can't perform exchange due to pending operation. Please share full logs from all nodes with thread dumps, it will help to find a root cause.
Evgenii 2018-01-16 5:35 GMT+03:00 [email protected] <[email protected]>: > Hi All, > > We have a ignite cluster running about 20+ nodes, for any case JVM > memory issue we schedule reboot those nodes at middle night. > > but in order to keep the service supplied, we reboot them one by one like > A,B,C,D nodes we reboot them at 5 mins delay; but if we doing so, the > reboot nodes can never join to the cluster again. > > Eventually the entire cluster can not work any more forever waiting for > joining to the topology; we need to kill all and reboot from started, this > sound incredible. > > I not sure whether any more meet this issue before, or any mistake we may > make, attached is the ignite log. > > > Thanks for your time! > > Regards > Aaron > ------------------------------ > Aaron.Kuai >
