Hi Minh, You should have messages in the nifi-app.log about the node failing to respond within the configured cluster comms timeout in nifi.properties. You may want to increase that and see if it reduces the number of disconnect events.
In my experience, a disconnected node typically happens when the CPU load gets too high or there are too many threads waiting their turn because of slow processors. If your nodes are short on RAM, garbage collection may also play a role. Typically, you’ll see other symptoms before the disconnection like a slow interface, intermittent errors when navigating, etc. When those occur, check the Node Status History from the main menu for these items: * CPU Load: should not be higher than the number of CPUs per node (This is not CPU usage % but the number of threads wanting to run). * Heap utilization: should probably look like a sawtooth pattern and not stay above 90-95% for any amount of time. * Open Files: should be well lower than the limit. * If Heap utilization is consistently high, check the garbage collection times. These should be a small fraction of the uptime. The collection time in the Cluster view is easier to read than in the Node Status history, but the 24 hour view in the history may give you an idea of when the problem occurs. If one or more of these indicate problems, you can often see in the summary window which processors are causing problems by sorting on the various columns, especially total task duration, which may for example show that a processor is taking 40 minutes per 5 minute window because it’s taking 8 threads for itself all the time. These may just be waiting for a slow external system to time out, so can show high values even if the CPU use is not so high. I hope this helps. I haven’t found a bulletproof way to diagnose these issues yet in NiFi, especially on clusters run many processors. Maybe someone else has a more focused way. Regards, Isha Van: e-soci...@gmx.fr <e-soci...@gmx.fr> Verzonden: vrijdag 27 oktober 2023 09:16 Aan: users@nifi.apache.org Onderwerp: How find the root cause "Node disconnect to the cluster" Hello all, Since I'm working with NIFI, I got a lot of difficulty to find the root cause why the node is disconnected to the cluster. I check nifi-app.log nifi-bootstrap.log but it is a bit complicated to find the correct information. Somebody has some informations for me ? Regards Minh