Thank you so much Mark.
The pointers were helpful & definitely in the right direction.

The #flow files was huge because MySQL CDC processor had not been running
since a couple of days - resulting in accumulation of bin log entries.
And whenever I tried processing those, the CPU would max at 100% and nodes
would fall out of the cluster - helping in no way.

Now all of it makes lot of sense. Thanks folks!

On Mon, Jun 10, 2019 at 7:17 PM Mark Payne <marka...@hotmail.com> wrote:

> I don't know that this is actually unexpected. What you observed is that
> you had million of FlowFiles queued up to be processed. NiFi was not
> processing them with 100% CPU utilization. This typically indicates one of
> two things: a) You haven't allocated enough threads, or b) you have a
> bottleneck other than CPU - likely Disk I/O.
>
> Once you restarted NiFi, you were in a situation where you had improved
> your disk I/O. If you were previously not at 100% CPU utilization due to a
> Disk I/O bottleneck, and you then removed that bottleneck by improving disk
> I/O like you mentioned, then it makes sense that NiFi would now start
> consuming more CPU - even up to 100% - to handle those millions of
> FlowFiles that are queued up.
>
>
>
> On Jun 10, 2019, at 9:07 AM, Joe Witt <joe.w...@gmail.com> wrote:
>
> buffering flowfiles like that is supported by design and common so it
> would be ideal to figure out what happened.
>
> On Mon, Jun 10, 2019, 9:02 AM Shanker Sneh <shanker.s...@zoomcar.com>
> wrote:
>
>> Flowfiles were close to ~7 million .. 8 threads (as I have 4 vCPU in 1
>> box). Max heap allocated is 12Gb. So the usage was ~60%
>>
>> Joe, I think it has something to do with what Wookcock suggested.
>> Clearing up content & FlowFiles seem to have CPU manageable.
>> Allow me 1-2 days and I shall report back if it solves the problem.
>>
>> On Mon, Jun 10, 2019 at 6:23 PM Joe Witt <joe.w...@gmail.com> wrote:
>>
>>> how many flowfiles were in queue?  how many threads for nifi to use?
>>>  how was heap?
>>>
>>> On Mon, Jun 10, 2019, 8:44 AM Shanker Sneh <shanker.s...@zoomcar.com>
>>> wrote:
>>>
>>>> Thanks Joe for reading through and helping me. :)
>>>>
>>>>
>>>>    - NiFi hasn't been upgraded. its 1.8.0 (community version of Horton
>>>>    works data flow).
>>>>    - OS/Kernel is the same. Just that I have added more capacity to
>>>>    disk (with better IO).
>>>>    - JVM continues to be the same. Java 8.
>>>>    - When CPU is 100%, top shoes just NiFi java process. When I
>>>>    provided with more cores (as high as 16), NiFi used all 16 nodes and
>>>>    throttled at 1600%.
>>>>
>>>>
>>>> Meanwhile, I am trying to clear up all FlowFiles from disk and start
>>>> the flows afresh.
>>>>
>>>>
>>>> On Mon, Jun 10, 2019 at 5:42 PM Joe Witt <joe.w...@gmail.com> wrote:
>>>>
>>>>> Sneh
>>>>>
>>>>> It was stable for months but now is high...
>>>>>
>>>>> has nifi been upgraded?  what version before vs now?
>>>>>
>>>>> has the os/kernel been changed?
>>>>>
>>>>> has the jvm been updated?
>>>>>
>>>>> when cpu is 100 what does top show?
>>>>>
>>>>> thanks
>>>>>
>>>>> On Mon, Jun 10, 2019, 7:59 AM Shanker Sneh <shanker.s...@zoomcar.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the suggestions Joe.
>>>>>> Actually the issue is persistent even after reverting to the
>>>>>> 'older-regular-incremental-load' of the data flow* (which used to
>>>>>> work fine since months on similarly-configured hardware a few days back 
>>>>>> by
>>>>>> utilising just ~50% of resources)*.
>>>>>>
>>>>>> These days, one of the 2-node cluster gets out of NiFi every now and
>>>>>> then as the CPU peaks 100% for that particular machine. And subsequently
>>>>>> the other node reaches 100% CPU too.
>>>>>> When I restart NiFi on a particular node, CPU tanks to 0 and then
>>>>>> spikes to 100% within few minutes - the data flowing through the pipeline
>>>>>> is *just too less* to throttle my CPU ideally.
>>>>>>
>>>>>> The machine config and NiFi config remains untouched - this has left
>>>>>> me confused where the problem might be. Something which had been running
>>>>>> smoothly since months, has become a challenge now.
>>>>>>
>>>>>> On Fri, Jun 7, 2019 at 8:16 PM Joe Witt <joe.w...@gmail.com> wrote:
>>>>>>
>>>>>>> Shanker
>>>>>>>
>>>>>>> It sounds like you've gone through some changes in general and have
>>>>>>> worked through those.  Now you have a flow running with a high volume of
>>>>>>> data (history load) and want to know which parts of the flow are most
>>>>>>> expensive/consuming the CPU.
>>>>>>>
>>>>>>> You should be able to look at the statistics provided on the
>>>>>>> processors to see where the majority of CPU time is spent.  You can 
>>>>>>> usually
>>>>>>> very easily reason over this if it is doing compression/encryption/etc..
>>>>>>> and determine if you want to give it more threads/less threads/batch 
>>>>>>> data
>>>>>>> together better, etc..
>>>>>>>
>>>>>>> The configuration of the VMs, the NiFi instance itself, the flow,
>>>>>>> and the nature of the data are all important to see/understand to be of
>>>>>>> much help here.
>>>>>>>
>>>>>>> THanks
>>>>>>>
>>>>>>> On Fri, Jun 7, 2019 at 7:07 AM Shanker Sneh <
>>>>>>> shanker.s...@zoomcar.com> wrote:
>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> I am facing strange issue with NiFi 1.8.0 (2 nodes)
>>>>>>>> My flows had been running fine since months.
>>>>>>>>
>>>>>>>> Yesterday I had to do some history load which filled up my both
>>>>>>>> disks (I have FlowFile repository as separate disk).
>>>>>>>>
>>>>>>>> I increased the size of the root & flowflile disk both. And 'grow'
>>>>>>>> the disk partition and 'extended' the file system (it's an EC2 linux).
>>>>>>>> But post that my CPU has been spiking to complete 100% - even at
>>>>>>>> regular load (earlier it used to be somewhere around 50%)
>>>>>>>> Also I did no change to the config values or thread count etc.
>>>>>>>>
>>>>>>>> I upgraded the 2 nodes to see if that solves the problem - from 16
>>>>>>>> Gb box (4 core) to 64 Gb (16 core).
>>>>>>>> But even the larger box is throttling on the CPU at 100%.
>>>>>>>>
>>>>>>>> I tried clearing all repositories and restarted NiFi application
>>>>>>>> and the EC2 - but no improvement.
>>>>>>>>
>>>>>>>> Kindly point me in the right direction. I am unable to pinpoint
>>>>>>>> anything.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best,
>>>>>>>> Sneh
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best,
>>>>>> Sneh
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Best,
>>>> Sneh
>>>>
>>>
>>
>> --
>> Best,
>> Sneh
>>
>
>

-- 
Best,
Sneh

Reply via email to