Hi,

I'm having an issue with a small (two node) NiFi cluster where the nodes
will stop processing any queued flowfiles.  I haven't seen any error
messages logged related to it, and when attempting to restart the service,
NiFi doesn't respond and the script forcibly kills it.  This causes
multiple flowfile version to hang around, and generally makes me feel like
it might be causing data loss.

I'm running the web UI on a different box, and when things stop working, it
stops showing changes to counts in any queues, and the thread count never
changes.  It still thinks the nodes are connecting and responding, though.

My environment is two 8 cpu systems w/ 60GB memory with 48GB given to the
NiFi JVM in bootstrap.conf.  I have timer threads limited to 12, and event
threads to 4.  Install is on the current Amazon Linux AMI and using OpenJDK
1.8.0.91 x64.

Any idea, other debug steps, or changes that I can try?  I'm running 0.7.0,
having upgraded from 0.6.1, but this has been occurring with both
versions.  The higher the flowfile volume I push through, the faster this
happens.

Thanks for any help there is to give!

-Aaron Longfield

Reply via email to