Hi Joe, Nothing is load balanced- it's all basic queues.
Mark, I'm using NiFi 1.19.1. nifi.performance.tracking.percentage sounds exactly what I might need. I'll give that a shot. Richard, I hadn't looked at the rotating logs and/or cleared them out. I'll give that a shot too. Thank you all. Please keep the suggestions coming. -Aaron On Wed, Jan 10, 2024 at 1:34 PM Richard Beare <[email protected]> wrote: > I had a similar sounding issue, although not in a Kube cluster. Nifi was > running in a docker container and the issue was the log rotation > interacting with the log file being mounted from the host. The mounted log > file was not deleted on rotation, meaning that once rotation was triggered > by log file size it would be continually triggered because the new log file > was never emptied. The clue was that the content of rotated logfiles was > mostly the same, with only a small number of messages appended to each new > one. Rotating multi GB logs was enough to destroy performance, especially > if it was being triggered frequently by debug messages. > > On Thu, Jan 11, 2024 at 7:14 AM Aaron Rich <[email protected]> wrote: > >> Hi Joe, >> >> It's a pretty fixed size objects at a fixed interval- One 5mb-ish file, >> we break down to individual rows. >> >> I went so far as to create a "stress test" where I have a generateFlow( >> creating a fix, 100k fille, in batches of 1000, every .1s) feeding right >> into a putFile. I wanted to see the sustained max. It was very stable, fast >> for over a week running - but now it's extremely slow. That was able as >> simple of a data flow I could think of to hit all the different resources >> (CPU, memory >> >> I was thinking too, maybe it was memory but it's slow right at the start >> when starting NiFi. I would expect the memory to cause it to be slower over >> time, and the stress test showed it wasn't something that was fluenting >> over time. >> >> I'm happy to make other flows that anyone can suggest to help >> troubleshoot, diagnose issue. >> >> Lars, >> >> We haven't changed it between when performance was good and now when it's >> slow. That is what is throwing me - nothing changed from NiFi configuration >> standby. >> My guess is we are having some throttling/resource contention from our >> provider but I can't determine what/where/how. The Grafana cluster >> dashboards I have don't indicate issues. If there are suggestions for >> specific cluster metrics to plot/dashboards to use, I'm happy to build them >> and contribute them back (I do have a dashboard I need to figure out how to >> share for creating the "status history" plots in Grafana). >> The repos aren't full and I tried even blowing them away just to see if >> that made a difference. >> I'm not seeing anything new in the logs that indicate an issue...but >> maybe I'm missing it so I will try to look again >> >> By chance, are there any low level debugging metrics/observability/etc >> that would show how long things like writing to the repository disks is >> taking? There is a part of me that feels this could be a Disk I/O resource >> issue but I don't know how I can verify that is/isn't the issue. >> >> Thank you all for the help and suggestions - please keep them coming as >> I'm grasping at straws right now. >> >> -Aaron >> >> >> On Wed, Jan 10, 2024 at 10:10 AM Joe Witt <[email protected]> wrote: >> >>> Aaron, >>> >>> The usual suspects are memory consumption leading to high GC leading to >>> lower performance over time, or back pressure in the flow, etc.. But your >>> description does not really fit either exactly. Does your flow see a mix >>> of large objects and smaller objects? >>> >>> Thanks >>> >>> On Wed, Jan 10, 2024 at 10:07 AM Aaron Rich <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> >>>> >>>> I’m running into an odd issue and hoping someone can point me in the >>>> right direction. >>>> >>>> >>>> >>>> I have NiFi 1.19 deployed in a Kube cluster with all the repositories >>>> volume mounted out. It was processing great with processors like >>>> UpdateAttribute sending through 15K/5m PutFile sending through 3K/5m. >>>> >>>> >>>> >>>> With nothing changing in the deployment, the performance has dropped to >>>> UpdateAttribute doing 350/5m and Putfile to 200/5m. >>>> >>>> >>>> >>>> I’m trying to determine what resource is suddenly dropping our >>>> performance like this. I don’t see anything on the Kube monitoring that >>>> stands out and I have restarted, cleaned repos, changed nodes but nothing >>>> is helping. >>>> >>>> >>>> >>>> I was hoping there is something from the NiFi POV that can help >>>> identify the limiting resource. I'm not sure if there is additional >>>> diagnostic/debug/etc information available beyond the node status graphs. >>>> >>>> >>>> >>>> Any help would be greatly appreciated. >>>> >>>> >>>> >>>> Thanks. >>>> >>>> >>>> >>>> -Aaron >>>> >>>
