Re: NiFi cluster goes 100% CPU in no time

Joe Witt Mon, 10 Jun 2019 06:08:28 -0700

buffering flowfiles like that is supported by design and common so it would
be ideal to figure out what happened.


On Mon, Jun 10, 2019, 9:02 AM Shanker Sneh <shanker.s...@zoomcar.com> wrote:

> Flowfiles were close to ~7 million .. 8 threads (as I have 4 vCPU in 1
> box). Max heap allocated is 12Gb. So the usage was ~60%
>
> Joe, I think it has something to do with what Wookcock suggested. Clearing
> up content & FlowFiles seem to have CPU manageable.
> Allow me 1-2 days and I shall report back if it solves the problem.
>
> On Mon, Jun 10, 2019 at 6:23 PM Joe Witt <joe.w...@gmail.com> wrote:
>
>> how many flowfiles were in queue?  how many threads for nifi to use?
>>  how was heap?
>>
>> On Mon, Jun 10, 2019, 8:44 AM Shanker Sneh <shanker.s...@zoomcar.com>
>> wrote:
>>
>>> Thanks Joe for reading through and helping me. :)
>>>
>>>
>>>    - NiFi hasn't been upgraded. its 1.8.0 (community version of Horton
>>>    works data flow).
>>>    - OS/Kernel is the same. Just that I have added more capacity to
>>>    disk (with better IO).
>>>    - JVM continues to be the same. Java 8.
>>>    - When CPU is 100%, top shoes just NiFi java process. When I
>>>    provided with more cores (as high as 16), NiFi used all 16 nodes and
>>>    throttled at 1600%.
>>>
>>>
>>> Meanwhile, I am trying to clear up all FlowFiles from disk and start the
>>> flows afresh.
>>>
>>>
>>> On Mon, Jun 10, 2019 at 5:42 PM Joe Witt <joe.w...@gmail.com> wrote:
>>>
>>>> Sneh
>>>>
>>>> It was stable for months but now is high...
>>>>
>>>> has nifi been upgraded?  what version before vs now?
>>>>
>>>> has the os/kernel been changed?
>>>>
>>>> has the jvm been updated?
>>>>
>>>> when cpu is 100 what does top show?
>>>>
>>>> thanks
>>>>
>>>> On Mon, Jun 10, 2019, 7:59 AM Shanker Sneh <shanker.s...@zoomcar.com>
>>>> wrote:
>>>>
>>>>> Thanks for the suggestions Joe.
>>>>> Actually the issue is persistent even after reverting to the
>>>>> 'older-regular-incremental-load' of the data flow* (which used to
>>>>> work fine since months on similarly-configured hardware a few days back by
>>>>> utilising just ~50% of resources)*.
>>>>>
>>>>> These days, one of the 2-node cluster gets out of NiFi every now and
>>>>> then as the CPU peaks 100% for that particular machine. And subsequently
>>>>> the other node reaches 100% CPU too.
>>>>> When I restart NiFi on a particular node, CPU tanks to 0 and then
>>>>> spikes to 100% within few minutes - the data flowing through the pipeline
>>>>> is *just too less* to throttle my CPU ideally.
>>>>>
>>>>> The machine config and NiFi config remains untouched - this has left
>>>>> me confused where the problem might be. Something which had been running
>>>>> smoothly since months, has become a challenge now.
>>>>>
>>>>> On Fri, Jun 7, 2019 at 8:16 PM Joe Witt <joe.w...@gmail.com> wrote:
>>>>>
>>>>>> Shanker
>>>>>>
>>>>>> It sounds like you've gone through some changes in general and have
>>>>>> worked through those.  Now you have a flow running with a high volume of
>>>>>> data (history load) and want to know which parts of the flow are most
>>>>>> expensive/consuming the CPU.
>>>>>>
>>>>>> You should be able to look at the statistics provided on the
>>>>>> processors to see where the majority of CPU time is spent.  You can 
>>>>>> usually
>>>>>> very easily reason over this if it is doing compression/encryption/etc..
>>>>>> and determine if you want to give it more threads/less threads/batch data
>>>>>> together better, etc..
>>>>>>
>>>>>> The configuration of the VMs, the NiFi instance itself, the flow, and
>>>>>> the nature of the data are all important to see/understand to be of much
>>>>>> help here.
>>>>>>
>>>>>> THanks
>>>>>>
>>>>>> On Fri, Jun 7, 2019 at 7:07 AM Shanker Sneh <shanker.s...@zoomcar.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> I am facing strange issue with NiFi 1.8.0 (2 nodes)
>>>>>>> My flows had been running fine since months.
>>>>>>>
>>>>>>> Yesterday I had to do some history load which filled up my both
>>>>>>> disks (I have FlowFile repository as separate disk).
>>>>>>>
>>>>>>> I increased the size of the root & flowflile disk both. And 'grow'
>>>>>>> the disk partition and 'extended' the file system (it's an EC2 linux).
>>>>>>> But post that my CPU has been spiking to complete 100% - even at
>>>>>>> regular load (earlier it used to be somewhere around 50%)
>>>>>>> Also I did no change to the config values or thread count etc.
>>>>>>>
>>>>>>> I upgraded the 2 nodes to see if that solves the problem - from 16
>>>>>>> Gb box (4 core) to 64 Gb (16 core).
>>>>>>> But even the larger box is throttling on the CPU at 100%.
>>>>>>>
>>>>>>> I tried clearing all repositories and restarted NiFi application and
>>>>>>> the EC2 - but no improvement.
>>>>>>>
>>>>>>> Kindly point me in the right direction. I am unable to pinpoint
>>>>>>> anything.
>>>>>>>
>>>>>>> --
>>>>>>> Best,
>>>>>>> Sneh
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Best,
>>>>> Sneh
>>>>>
>>>>
>>>
>>> --
>>> Best,
>>> Sneh
>>>
>>
>
> --
> Best,
> Sneh
>

Re: NiFi cluster goes 100% CPU in no time

Reply via email to