Re: How I put the cluster down.

Andrew Grande Fri, 28 Oct 2016 16:53:28 -0700

Hi,

I'd suggest couple things. Have you configured backpressure controls on
connections? NiFi 1.0.0 adds 10000evt/1GB by default IIRC. This can help
avoid overwhelming components in a flow.


Next, the 2 core CPU is really inadequate for high throughput system, see
if you can get something better. It seems there's a lot going on in your
cluster. A full NiFi node with many flows does a lot of housekeeping in the
background, needs some power.

Andrew

On Fri, Oct 28, 2016, 8:36 AM Alessio Palma <alessio.pa...@buongiorno.com>
wrote:

> Hello Witt,
> before anything else thanks for your help.
> Fortunatly I  put down only the NIFI cluster, otherwise I was already in
> vacation :)
>
> After I posted this problem I kept to torture staging NIFI and
> discovered that when CPU LOAD gets very high, nodes loose connection and
> anything starts going in the bad directory. Also the WEB GUI becomes not
> responsive, you have no option to stop workflows.
>
> You can reproduce this issue starting some workflows composed by
> 1) GenerateFlowFile ( 1 Kb size, Timer driven, 0 sec run schedule )
> 2) ReplaceText ( just to force the use of regexp )
> 3) HashContent, ( auto terminate both relationships )
>
> Currently my staging cluster is composed by 2 virtual host configured as:
> 2 Core cpu ( Intel(R) Xeon(R) CPU E7- 2870  @ 2.40GHz )
> 2 GB RAM
> 18 GB HD
>
> The problem raised when the CPU load goes over 8, this basically means
> when you start 8 of the above WF.
>
> I noticed NIFI attempts to reduce the load but this does not works too
> much and does not avoid the general failure.
>
> Here you can see the errors which started to show under stress:
>
> https://drive.google.com/drive/folders/0B7NTMIqrCjESN0JURnRtZWp5Tms?usp=sharing
>
>
> The 1st question is: is here a way to keep the load under some critical
> values? Is there some "how to" which helps me to configure NIFI ?
> Currently it is using the factory settings and no customization has been
> performed but LDAP login.
>
> AP
>
>
>
> On 28/10/2016 13:24, Joe Witt wrote:
> > Alessio
> >
> > You have two clusters here potentially.  The NiFi cluster and the
> > Hadoop cluster.  Which one went down?
> >
> > If NiFi went down I'd suspect memory exhaustion issues because other
> > resource exhaustion issues like full file system, exhausted file
> > handles, pegged CPU, etc.. tend not to cause it to restart.  If memory
> > related you'll probably see something in the nifi-app.log.  Try going
> > with a larger heap as can be controlled in conf/bootstrap.conf.
> >
> > Thanks
> > Joe
> >
> > On Fri, Oct 28, 2016 at 5:55 AM, Alessio Palma
> > <alessio.pa...@buongiorno.com> wrote:
> >> Hello all,
> >> yesterday, for a mistake, basically I executed " ls -R / " using the
> >> ListHDFS processor and the whole cluster gone down ( not just a node ).
> >>
> >> Something like this also happened when I was playing with some DO WHILE
> >> / WHILE DO patterns. I have only the nifi logs and they show the
> >> heartbeat has been lost. About the CPU LOAD, NETWORK TRAFFIC I have no
> >> info. Any pointers about where do I have look for the problem's root ?
> >>
> >> Today I'm trying to repeat the problems I got with DO/WHILE, nothing bad
> >> is happening although CPU LOAD is enough high and NETWORK  TRAFFIC
> >> increased up to 282 Kb/sec.
> >>
> >> Of course I can redo the "ls -R /" on production, however I like to
> >> avoid it since there are already some ingestion flows running.
> >>
> >> AP
> > .
> >
>

Re: How I put the cluster down.

Reply via email to