Hi, I'd suggest couple things. Have you configured backpressure controls on connections? NiFi 1.0.0 adds 10000evt/1GB by default IIRC. This can help avoid overwhelming components in a flow.
Next, the 2 core CPU is really inadequate for high throughput system, see if you can get something better. It seems there's a lot going on in your cluster. A full NiFi node with many flows does a lot of housekeeping in the background, needs some power. Andrew On Fri, Oct 28, 2016, 8:36 AM Alessio Palma <alessio.pa...@buongiorno.com> wrote: > Hello Witt, > before anything else thanks for your help. > Fortunatly I put down only the NIFI cluster, otherwise I was already in > vacation :) > > After I posted this problem I kept to torture staging NIFI and > discovered that when CPU LOAD gets very high, nodes loose connection and > anything starts going in the bad directory. Also the WEB GUI becomes not > responsive, you have no option to stop workflows. > > You can reproduce this issue starting some workflows composed by > 1) GenerateFlowFile ( 1 Kb size, Timer driven, 0 sec run schedule ) > 2) ReplaceText ( just to force the use of regexp ) > 3) HashContent, ( auto terminate both relationships ) > > Currently my staging cluster is composed by 2 virtual host configured as: > 2 Core cpu ( Intel(R) Xeon(R) CPU E7- 2870 @ 2.40GHz ) > 2 GB RAM > 18 GB HD > > The problem raised when the CPU load goes over 8, this basically means > when you start 8 of the above WF. > > I noticed NIFI attempts to reduce the load but this does not works too > much and does not avoid the general failure. > > Here you can see the errors which started to show under stress: > > https://drive.google.com/drive/folders/0B7NTMIqrCjESN0JURnRtZWp5Tms?usp=sharing > > > The 1st question is: is here a way to keep the load under some critical > values? Is there some "how to" which helps me to configure NIFI ? > Currently it is using the factory settings and no customization has been > performed but LDAP login. > > AP > > > > On 28/10/2016 13:24, Joe Witt wrote: > > Alessio > > > > You have two clusters here potentially. The NiFi cluster and the > > Hadoop cluster. Which one went down? > > > > If NiFi went down I'd suspect memory exhaustion issues because other > > resource exhaustion issues like full file system, exhausted file > > handles, pegged CPU, etc.. tend not to cause it to restart. If memory > > related you'll probably see something in the nifi-app.log. Try going > > with a larger heap as can be controlled in conf/bootstrap.conf. > > > > Thanks > > Joe > > > > On Fri, Oct 28, 2016 at 5:55 AM, Alessio Palma > > <alessio.pa...@buongiorno.com> wrote: > >> Hello all, > >> yesterday, for a mistake, basically I executed " ls -R / " using the > >> ListHDFS processor and the whole cluster gone down ( not just a node ). > >> > >> Something like this also happened when I was playing with some DO WHILE > >> / WHILE DO patterns. I have only the nifi logs and they show the > >> heartbeat has been lost. About the CPU LOAD, NETWORK TRAFFIC I have no > >> info. Any pointers about where do I have look for the problem's root ? > >> > >> Today I'm trying to repeat the problems I got with DO/WHILE, nothing bad > >> is happening although CPU LOAD is enough high and NETWORK TRAFFIC > >> increased up to 282 Kb/sec. > >> > >> Of course I can redo the "ls -R /" on production, however I like to > >> avoid it since there are already some ingestion flows running. > >> > >> AP > > . > > >