Aaron,

My guess would be that you are hitting a Full Garbage Collection. With such a 
huge Java heap, that will cause a "stop the world" pause for quite a long time.
Which garbage collector are you using? Have you tried reducing the heap from 48 
GB to say 4 or 8 GB?

Thanks
-Mark


> On Jul 14, 2016, at 11:14 AM, Aaron Longfield <alongfi...@gmail.com> wrote:
> 
> Hi,
> 
> I'm having an issue with a small (two node) NiFi cluster where the nodes will 
> stop processing any queued flowfiles.  I haven't seen any error messages 
> logged related to it, and when attempting to restart the service, NiFi 
> doesn't respond and the script forcibly kills it.  This causes multiple 
> flowfile version to hang around, and generally makes me feel like it might be 
> causing data loss.
> 
> I'm running the web UI on a different box, and when things stop working, it 
> stops showing changes to counts in any queues, and the thread count never 
> changes.  It still thinks the nodes are connecting and responding, though.
> 
> My environment is two 8 cpu systems w/ 60GB memory with 48GB given to the 
> NiFi JVM in bootstrap.conf.  I have timer threads limited to 12, and event 
> threads to 4.  Install is on the current Amazon Linux AMI and using OpenJDK 
> 1.8.0.91 x64.
> 
> Any idea, other debug steps, or changes that I can try?  I'm running 0.7.0, 
> having upgraded from 0.6.1, but this has been occurring with both versions.  
> The higher the flowfile volume I push through, the faster this happens.
> 
> Thanks for any help there is to give!
> 
> -Aaron Longfield

Reply via email to