Hi Minh,

You should have messages in the nifi-app.log about the node failing to respond 
within the configured cluster comms timeout in nifi.properties. You may want to 
increase that and see if it reduces the number of disconnect events.

In my experience, a disconnected node typically happens when the CPU load gets 
too high or there are too many threads waiting their turn because of slow 
processors. If your nodes are short on RAM, garbage collection may also play a 
role.

Typically, you’ll see other symptoms before the disconnection like a slow 
interface, intermittent errors when navigating, etc. When those occur, check 
the Node Status History from the main menu for these items:

  *   CPU Load: should not be higher than the number of CPUs per node (This is 
not CPU usage % but the number of threads wanting to run).
  *   Heap utilization: should probably look like a sawtooth pattern and not 
stay above 90-95% for any amount of time.
  *   Open Files: should be well lower than the limit.
  *   If Heap utilization is consistently high, check the garbage collection 
times. These should be a small fraction of the uptime. The collection time in 
the Cluster view is easier to read than in the Node Status history, but the 24 
hour view in the history may give you an idea of when the problem occurs.

If one or more of these indicate problems, you can often see in the summary 
window which processors are causing problems by sorting on the various columns, 
especially total task duration, which may for example show that a processor is 
taking 40 minutes per 5 minute window because it’s taking 8 threads for itself 
all the time.
These may just be waiting for a slow external system to time out, so can show 
high values even if the CPU use is not so high.

I hope this helps. I haven’t found a bulletproof way to diagnose these issues 
yet in NiFi, especially on clusters run many processors. Maybe someone else has 
a more focused way.

Regards,

Isha

Van: e-soci...@gmx.fr <e-soci...@gmx.fr>
Verzonden: vrijdag 27 oktober 2023 09:16
Aan: users@nifi.apache.org
Onderwerp: How find the root cause "Node disconnect to the cluster"


Hello all,

Since I'm working with NIFI, I got a lot of difficulty to find the root cause 
why the node is disconnected to the cluster.
I check nifi-app.log nifi-bootstrap.log but it is a bit complicated to find the 
correct information.

Somebody has some informations for me ?

Regards

Minh

Reply via email to