I've added a ticket to the NiFI Jira Outlining all the missing properties from the Sys Admin Guide. We'd really appreciate them getting into the documentation.
https://issues.apache.org/jira/browse/NIFI-9029 As far as this issue, it seems like a pretty repeatable process to lock-up the canvas and freeze all processing. It would be really great if there was a visual indicator on the canvas to report this to users. https://issues.apache.org/jira/browse/NIFI-9030 Perhaps the default 2% should also be changed? That seems like a fairly low high-watermark value. Thanks, Ryan On Fri, Aug 6, 2021 at 12:08 PM Ryan Hendrickson < [email protected]> wrote: > Elli replied to this, although it looks like his email got flagged for > spam, so I'm replying with his comments to make sure it got through: > > We are experiencing this issue as well. We just upgraded from Nifi 1.11.4 > to 1.13.2, and are running in to this issue where many of our high-usage > Nifi instances are just hanging. For example, we have a 7 node cluster that > has flowfiles stuck in queues and not moving. We noticed that on 3 of those > nodes, the flowfile content storage was over 50%, and those are the nodes > that have flowfiles stuck in the queue. The other nodes have nothing on > them. No new data is flowing in to the cluster at all, and nothing is > moving on any of the nodes. We see this problem also on non-cluster > machines; the cluster just makes it more obvious that this archive max > usage percentage might be the cause. > > We have a lot of merge content processors. We realize that there were a > lot of I/O improvements in the newer version of Nifi - Joe, we suspect > these efficiencies might be exacerbating the problem: > > *NiFi 1.13.1 - [full_list]* > > - [NIFI-7646] - Improve performance of MergeContent / others that read > content of many small FlowFiles > - [NIFI-8222] - When processing a lot of small FlowFiles, Provenance > Repo spends most of its time in lock contention. That can be improved. > > > *NiFi 1.14.0 - [full list]* > > - [NIFI-8633] - Content Repository can be improved to make fewer disks > accesses on read. > - Mark Payne's notes: > > *"For those interested in the actual performance numbers here, I ran a > pretty simple flow that generated a lot of tiny JSON messages, and then > used ConvertRecord to convert from JSON to Avro. Ran a profiler against > it > and found that about 50% of the time for ConvertRecord was spent in > FileSystemRepository.read(). This is called twice - once when we read > the > data for inferring schema, a second time when we parse the data. Of the > time spent in FileSystemRepository.read(), about 50% of that time was > spent > in Files.exists(). So this should improve performance of that flow by > something like 25%"* > > > We didn't know about the ...archive.backpressure.percentage property - we > don't see it in the Admin guide > https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html. We > will set this property to a lot higher than 2% above the max usage > percentage and see how it goes. Now that we think about it, we believe > we've experienced this problem occasionally before the upgrade, but it has > become very frequent since the upgrade. > > > > On Mon, May 3, 2021 at 1:09 PM Shawn Weeks <[email protected]> > wrote: > >> Sorry, I wasn't saying that >> 'nifi.content.repository.archive.max.usage.percentage' was new I just >> hadn't managed to get a NiFi instance stuck this way and even the >> documentation says that if archive is empty and the content repo needs more >> room it would disable the archive. I'm having trouble find where ' >> nifi.content.repository.archive.backpressure.percentage' is documented. >> >> Thanks >> >> -----Original Message----- >> From: Mark Payne <[email protected]> >> Sent: Monday, May 3, 2021 12:00 PM >> To: [email protected] >> Subject: Re: NiFi Get's Stuck Waiting On Non Existent Archive Cleanup >> >> Shawn, >> >> There are a couple of properties at play. The >> “nifi.content.repository.archive.max.usage.percentage" property behaves as >> you have described. But there’s also a second property: >> nifi.content.repository.archive.backpressure.percentage >> This controls at what point the Content Repository will actually apply >> back-pressure in order to avoid filling the disk. This property defaults to >> 2% more than the the max.usage.percentage. So by default it uses 50% and >> 52%. >> You can adjust the backpressure percentage to something much higher like >> 80%. So then if you reach 50% it would start clearing things out, and if >> you reach 80% it’ll start applying the brakes. This is here as a safeguard >> because we’ve had data flows that can produce the data much faster than it >> could archive/delete the data. This is common for data flows that produce >> huge numbers of files in the content repository. So that backpressure is >> there to ensure that the archive has a chance to run. >> >> This has always been here, though, ever since the initial open sourcing. >> Is not something new. It may be the case that in later versions we have >> been more efficient at creating the data, such that it’s now exceeding the >> rate that the cleanup can happen, not sure. But adjusting the >> “nifi.content.repository.archive.max.usage.percentage” property should get >> you into a better state. >> >> Thanks >> -Mark >> >> >> > From: Shawn Weeks <[email protected]> >> > Date: Mon, May 3, 2021 at 9:33 AM >> > Subject: RE: NiFi Get's Stuck Waiting On Non Existent Archive Cleanup >> > To: [email protected] <[email protected]> >> > >> > >> > Note I have a 2 node cluster which is why it’s sitting at around 900 >> GB. Per node content repo is sitting at 535gb currently and I’m not sure >> where the rest of the space is. I have 472GB free on each node in the >> content_repository partition as shown in the Cluster panel. >> > >> > >> > >> > Thanks >> > >> > Shawn Weeks >> > >> > >> > >> > From: Shawn Weeks >> > Sent: Monday, May 3, 2021 11:30 AM >> > To: [email protected] >> > Subject: NiFi Get's Stuck Waiting On Non Existent Archive Cleanup >> > >> > >> > >> > I’m not sure if this is specific to clustering or not but using the >> default configuration with 50% content archiving it is possible to cause >> NiFi to quit processing any data by simple filling up a queue with 50% of >> your content_repository storage. In my example my content_repository is 1TB >> and once a queue get’s to 500gb or so the next processor won’t process any >> more data. Once this occurs even stopping GenerateFlowFile won’t fix the >> problem and my CompressContent never does anything. It’s my understanding >> that “nifi.content.repository.archive.max.usage.percentage” only set’s the >> max amount of space that archive’s will use and should never prevent new >> content from being written in the 1.13.2 it appears be functioning as a >> reserve instead. I haven’t seen this in older versions of NiFi like 1.9.2 >> and I’m not sure when the behavior changed but even the documentation seems >> to indicate that this should not be happening. For example ‘If the archive >> is empty and content repository disk usage is above this percentage, then >> archiving is temporarily disabled.’ >> > >> > >> > >> > <image001.png> >> > >> > >> > >> > <image002.png> >> > >> > >> > >> > Thanks >> > >> > Shawn Weeks >> > >> >>
