I've since brought the node back up - no change.  Looks like IO is all related to flowfile repository.  When it's running, CPU is pretty high - usually ~12 cores (ie top will show 1200%) per node.  I'm using the XFS filesystem; maybe some FS parameters would help?

The big change is that I was using Kafka for queuing, and have re-done my flow so that it will use only NiFi's internal queuing. This was working great with small amount of data (100k records), but bringing in 8 million started causing this issue.  Even with everything off, as soon as I start one thing, I start getting timeouts and the disks just grind.

-Joe

On 3/22/2023 10:44 AM, Mark Payne wrote:
Sorry, apparently I dropped users@ from my previous reply.

Looking at the diagnostics, garbage collection looks very healthy. Overall CPU usage is also very low. The one thing that did strike me as interesting, though, is that you have one node in the cluster shutdown. While this shouldn’t cause issues (or if it does, only for a few seconds until all the nodes realize it’s disconnected), I’m curious if started seeing issues only after that was shutdown?

I also noted that you have the read timeout set to 5 secs in nifi.properties:
nifi.cluster.node.read.timeout : 5 sec

That might be worth increasing.

> On Mar 22, 2023, at 10:24 AM, Joe Obernberger <joseph.obernber...@gmail.com> wrote:
>
> Hi Mark - thank you so much for helping me.
> Any thoughts on the attached?
>
> -Joe
>
> On 3/22/2023 10:21 AM, Mark Payne wrote:
>> You can see how busy garbage collection is by running “nifi.sh diagnostics diag1.txt” and then looking t the diag1.txt file. It’ll contain a lot of information, including garbage collection details.
>>
>> Thanks
>> -Mark
>>
>>
>>> On Mar 22, 2023, at 10:19 AM, Joe Obernberger <joseph.obernber...@gmail.com> wrote:
>>>
>>> atop shows the disk as being all red with IO - 100% utilization. There are a lot of flowfiles currently trying to run through, but I can't monitor it because....UI wont' load.
>>>
>>> -Joe
>>>
>>> On 3/22/2023 10:16 AM, Mark Payne wrote:
>>>> Joe,
>>>>
>>>> I’d recommend taking a look at garbage collection. It is far more likely the culprit than disk I/O.
>>>>
>>>> Thanks
>>>> -Mark
>>>>
>>>>> On Mar 22, 2023, at 10:12 AM, Joe Obernberger <joseph.obernber...@gmail.com> wrote:
>>>>>
>>>>> I'm getting "java.net.SocketTimeoutException: timeout" from the user interface of NiFi when load is heavy.  This is 1.18.0 running on a 3 node cluster.  Disk IO is high and when that happens, I can't get into the UI to stop any of the processors.
>>>>> Any ideas?
>>>>>
>>>>> I have put the flowfile repository and content repository on different disks on the 3 nodes, but disk usage is still so high that I can't get in.
>>>>> Thank you!
>>>>>
>>>>> -Joe
>>>>>
>>>>>
>>>>> --
>>>>> This email has been checked for viruses by AVG antivirus software.
>>>>> www.avg.com <http://www.avg.com>


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com

Reply via email to