Greetings,

Tuning NiFi for optimal performance on a specific platform depends on which
processors you use.

If you use a lot of Put* processors (like PutHDFS or PutS3Object) then your
NiFi performance is likely going to be bound by I/O delays.  In this case
your NiFi system can probably handle cores x5 or x10 threads (80-160 for 16
cores) for max timer driven threads.  You can send many files over the
network simultaneously using PutHDFS, for example.

However, if you use a lot of content conversion processors (like
CompressContent) then your NiFi performance is going to be CPU bound.  In
this case I would limit NiFi max timer driven threads to cores - 2 (16 - 2
= 14).  The last thing you want is for processors to use all 16 cores and
the NiFi framework itself not getting enough CPU time to manage the
repositories or run the web UI.

I haven't done a study on the difference between timer driven or event
driven strategy, but I haven't noticed much difference.  It probably does
well in specific use cases, though there have been enough optimizations on
timer driven polling in NiFi to perhaps make the event driven strategy
obsolete.

Regards,
-- Mike



On Thu, Sep 22, 2016 at 10:02 AM, Pompilio Ramirez <pompibl...@gmail.com>
wrote:

> Hello,
>
> I am building a few servers that will be clustered.
> I have noticed that by default controller settings I can set the Maximum (
> timer and event driven ).
>
> In determining this should I assume that my thread count will be based on
> what my machine see's for CPU's?
>
> Output of mpstat -P ALL ?
>
> And for instance if I have 16 CPU's from that output then I should set my
> max to potentially 16?
>
>
> I've seen increased performance if I set the max timer driven count to
> numbers that are greater than my cpu's and I assume the underlying
> framework handles that as cpu cycles happen, However I am just trying to
> see if there is a rule of thumb assuming my server's sole purpose is to run
> NIFI.
> What is the community setting those to?  Something like:
>
> (cores) x 3 = max thread count
>
> It will be based on my dataflow and monitoring my system "cpu / memory /
> io / network"
> But want to gather others input.
>
> I am also wondering if given the choice on scheduling strategy, should I
> choose "event driven"?
>
> Thank you for your insight on this.
>
  • Threads Pompilio Ramirez
    • Re: Threads Michael Moser

Reply via email to