Thank you so much Mark. The pointers were helpful & definitely in the right direction.
The #flow files was huge because MySQL CDC processor had not been running since a couple of days - resulting in accumulation of bin log entries. And whenever I tried processing those, the CPU would max at 100% and nodes would fall out of the cluster - helping in no way. Now all of it makes lot of sense. Thanks folks! On Mon, Jun 10, 2019 at 7:17 PM Mark Payne <marka...@hotmail.com> wrote: > I don't know that this is actually unexpected. What you observed is that > you had million of FlowFiles queued up to be processed. NiFi was not > processing them with 100% CPU utilization. This typically indicates one of > two things: a) You haven't allocated enough threads, or b) you have a > bottleneck other than CPU - likely Disk I/O. > > Once you restarted NiFi, you were in a situation where you had improved > your disk I/O. If you were previously not at 100% CPU utilization due to a > Disk I/O bottleneck, and you then removed that bottleneck by improving disk > I/O like you mentioned, then it makes sense that NiFi would now start > consuming more CPU - even up to 100% - to handle those millions of > FlowFiles that are queued up. > > > > On Jun 10, 2019, at 9:07 AM, Joe Witt <joe.w...@gmail.com> wrote: > > buffering flowfiles like that is supported by design and common so it > would be ideal to figure out what happened. > > On Mon, Jun 10, 2019, 9:02 AM Shanker Sneh <shanker.s...@zoomcar.com> > wrote: > >> Flowfiles were close to ~7 million .. 8 threads (as I have 4 vCPU in 1 >> box). Max heap allocated is 12Gb. So the usage was ~60% >> >> Joe, I think it has something to do with what Wookcock suggested. >> Clearing up content & FlowFiles seem to have CPU manageable. >> Allow me 1-2 days and I shall report back if it solves the problem. >> >> On Mon, Jun 10, 2019 at 6:23 PM Joe Witt <joe.w...@gmail.com> wrote: >> >>> how many flowfiles were in queue? how many threads for nifi to use? >>> how was heap? >>> >>> On Mon, Jun 10, 2019, 8:44 AM Shanker Sneh <shanker.s...@zoomcar.com> >>> wrote: >>> >>>> Thanks Joe for reading through and helping me. :) >>>> >>>> >>>> - NiFi hasn't been upgraded. its 1.8.0 (community version of Horton >>>> works data flow). >>>> - OS/Kernel is the same. Just that I have added more capacity to >>>> disk (with better IO). >>>> - JVM continues to be the same. Java 8. >>>> - When CPU is 100%, top shoes just NiFi java process. When I >>>> provided with more cores (as high as 16), NiFi used all 16 nodes and >>>> throttled at 1600%. >>>> >>>> >>>> Meanwhile, I am trying to clear up all FlowFiles from disk and start >>>> the flows afresh. >>>> >>>> >>>> On Mon, Jun 10, 2019 at 5:42 PM Joe Witt <joe.w...@gmail.com> wrote: >>>> >>>>> Sneh >>>>> >>>>> It was stable for months but now is high... >>>>> >>>>> has nifi been upgraded? what version before vs now? >>>>> >>>>> has the os/kernel been changed? >>>>> >>>>> has the jvm been updated? >>>>> >>>>> when cpu is 100 what does top show? >>>>> >>>>> thanks >>>>> >>>>> On Mon, Jun 10, 2019, 7:59 AM Shanker Sneh <shanker.s...@zoomcar.com> >>>>> wrote: >>>>> >>>>>> Thanks for the suggestions Joe. >>>>>> Actually the issue is persistent even after reverting to the >>>>>> 'older-regular-incremental-load' of the data flow* (which used to >>>>>> work fine since months on similarly-configured hardware a few days back >>>>>> by >>>>>> utilising just ~50% of resources)*. >>>>>> >>>>>> These days, one of the 2-node cluster gets out of NiFi every now and >>>>>> then as the CPU peaks 100% for that particular machine. And subsequently >>>>>> the other node reaches 100% CPU too. >>>>>> When I restart NiFi on a particular node, CPU tanks to 0 and then >>>>>> spikes to 100% within few minutes - the data flowing through the pipeline >>>>>> is *just too less* to throttle my CPU ideally. >>>>>> >>>>>> The machine config and NiFi config remains untouched - this has left >>>>>> me confused where the problem might be. Something which had been running >>>>>> smoothly since months, has become a challenge now. >>>>>> >>>>>> On Fri, Jun 7, 2019 at 8:16 PM Joe Witt <joe.w...@gmail.com> wrote: >>>>>> >>>>>>> Shanker >>>>>>> >>>>>>> It sounds like you've gone through some changes in general and have >>>>>>> worked through those. Now you have a flow running with a high volume of >>>>>>> data (history load) and want to know which parts of the flow are most >>>>>>> expensive/consuming the CPU. >>>>>>> >>>>>>> You should be able to look at the statistics provided on the >>>>>>> processors to see where the majority of CPU time is spent. You can >>>>>>> usually >>>>>>> very easily reason over this if it is doing compression/encryption/etc.. >>>>>>> and determine if you want to give it more threads/less threads/batch >>>>>>> data >>>>>>> together better, etc.. >>>>>>> >>>>>>> The configuration of the VMs, the NiFi instance itself, the flow, >>>>>>> and the nature of the data are all important to see/understand to be of >>>>>>> much help here. >>>>>>> >>>>>>> THanks >>>>>>> >>>>>>> On Fri, Jun 7, 2019 at 7:07 AM Shanker Sneh < >>>>>>> shanker.s...@zoomcar.com> wrote: >>>>>>> >>>>>>>> Hello all, >>>>>>>> >>>>>>>> I am facing strange issue with NiFi 1.8.0 (2 nodes) >>>>>>>> My flows had been running fine since months. >>>>>>>> >>>>>>>> Yesterday I had to do some history load which filled up my both >>>>>>>> disks (I have FlowFile repository as separate disk). >>>>>>>> >>>>>>>> I increased the size of the root & flowflile disk both. And 'grow' >>>>>>>> the disk partition and 'extended' the file system (it's an EC2 linux). >>>>>>>> But post that my CPU has been spiking to complete 100% - even at >>>>>>>> regular load (earlier it used to be somewhere around 50%) >>>>>>>> Also I did no change to the config values or thread count etc. >>>>>>>> >>>>>>>> I upgraded the 2 nodes to see if that solves the problem - from 16 >>>>>>>> Gb box (4 core) to 64 Gb (16 core). >>>>>>>> But even the larger box is throttling on the CPU at 100%. >>>>>>>> >>>>>>>> I tried clearing all repositories and restarted NiFi application >>>>>>>> and the EC2 - but no improvement. >>>>>>>> >>>>>>>> Kindly point me in the right direction. I am unable to pinpoint >>>>>>>> anything. >>>>>>>> >>>>>>>> -- >>>>>>>> Best, >>>>>>>> Sneh >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Best, >>>>>> Sneh >>>>>> >>>>> >>>> >>>> -- >>>> Best, >>>> Sneh >>>> >>> >> >> -- >> Best, >> Sneh >> > > -- Best, Sneh