Hi Joe,

Nothing is load balanced- it's all basic queues.

Mark,
I'm using NiFi 1.19.1.

nifi.performance.tracking.percentage sounds exactly what I might need. I'll
give that a shot.

Richard,
I hadn't looked at the rotating logs and/or cleared them out. I'll give
that a shot too.

Thank you all. Please keep the suggestions coming.

-Aaron

On Wed, Jan 10, 2024 at 1:34 PM Richard Beare <[email protected]>
wrote:

> I had a similar sounding issue, although not in a Kube cluster. Nifi was
> running in a docker container and the issue was the log rotation
> interacting with the log file being mounted from the host. The mounted log
> file was not deleted on rotation, meaning that once rotation was triggered
> by log file size it would be continually triggered because the new log file
> was never emptied. The clue was that the content of rotated logfiles was
> mostly the same, with only a small number of messages appended to each new
> one. Rotating multi GB logs was enough to destroy performance, especially
> if it was being triggered frequently by debug messages.
>
> On Thu, Jan 11, 2024 at 7:14 AM Aaron Rich <[email protected]> wrote:
>
>> Hi Joe,
>>
>> It's a pretty fixed size objects at a fixed interval- One 5mb-ish file,
>> we break down to individual rows.
>>
>> I went so far as to create a "stress test" where I have a generateFlow(
>> creating a fix, 100k fille, in batches of 1000, every .1s) feeding right
>> into a putFile. I wanted to see the sustained max. It was very stable, fast
>> for over a week running - but now it's extremely slow. That was able as
>> simple of a data flow I could think of to hit all the different resources
>> (CPU, memory
>>
>> I was thinking too, maybe it was memory but it's slow right at the start
>> when starting NiFi. I would expect the memory to cause it to be slower over
>> time, and the stress test showed it wasn't something that was fluenting
>> over time.
>>
>> I'm happy to make other flows that anyone can suggest to help
>> troubleshoot, diagnose issue.
>>
>> Lars,
>>
>> We haven't changed it between when performance was good and now when it's
>> slow. That is what is throwing me - nothing changed from NiFi configuration
>> standby.
>> My guess is we are having some throttling/resource contention from our
>> provider but I can't determine what/where/how. The Grafana cluster
>> dashboards I have don't indicate issues. If there are suggestions for
>> specific cluster metrics to plot/dashboards to use, I'm happy to build them
>> and contribute them back (I do have a dashboard I need to figure out how to
>> share for creating the "status history" plots in Grafana).
>> The repos aren't full and I tried even blowing them away just to see if
>> that made a difference.
>> I'm not seeing anything new in the logs that indicate an issue...but
>> maybe I'm missing it so I will try to look again
>>
>> By chance, are there any low level debugging metrics/observability/etc
>> that would show how long things like writing to the repository disks is
>> taking? There is a part of me that feels this could be a Disk I/O resource
>> issue but I don't know how I can verify that is/isn't the issue.
>>
>> Thank you all for the help and suggestions - please keep them coming as
>> I'm grasping at straws right now.
>>
>> -Aaron
>>
>>
>> On Wed, Jan 10, 2024 at 10:10 AM Joe Witt <[email protected]> wrote:
>>
>>> Aaron,
>>>
>>> The usual suspects are memory consumption leading to high GC leading to
>>> lower performance over time, or back pressure in the flow, etc.. But your
>>> description does not really fit either exactly.  Does your flow see a mix
>>> of large objects and smaller objects?
>>>
>>> Thanks
>>>
>>> On Wed, Jan 10, 2024 at 10:07 AM Aaron Rich <[email protected]>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>>
>>>>
>>>> I’m running into an odd issue and hoping someone can point me in the
>>>> right direction.
>>>>
>>>>
>>>>
>>>> I have NiFi 1.19 deployed in a Kube cluster with all the repositories
>>>> volume mounted out. It was processing great with processors like
>>>> UpdateAttribute sending through 15K/5m PutFile sending through 3K/5m.
>>>>
>>>>
>>>>
>>>> With nothing changing in the deployment, the performance has dropped to
>>>> UpdateAttribute doing 350/5m and Putfile to 200/5m.
>>>>
>>>>
>>>>
>>>> I’m trying to determine what resource is suddenly dropping our
>>>> performance like this. I don’t see anything on the Kube monitoring that
>>>> stands out and I have restarted, cleaned repos, changed nodes but nothing
>>>> is helping.
>>>>
>>>>
>>>>
>>>> I was hoping there is something from the NiFi POV that can help
>>>> identify the limiting resource. I'm not sure if there is additional
>>>> diagnostic/debug/etc information available beyond the node status graphs.
>>>>
>>>>
>>>>
>>>> Any help would be greatly appreciated.
>>>>
>>>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>> -Aaron
>>>>
>>>

Reply via email to