I've added a ticket to the NiFI Jira Outlining all the missing properties
from the Sys Admin Guide.  We'd really appreciate them getting into the
documentation.

https://issues.apache.org/jira/browse/NIFI-9029

As far as this issue, it seems like a pretty repeatable process to lock-up
the canvas and freeze all processing.  It would be really great if there
was a visual indicator on the canvas to report this to users.
https://issues.apache.org/jira/browse/NIFI-9030

Perhaps the default 2% should also be changed?  That seems like a fairly
low high-watermark value.

Thanks,
Ryan

On Fri, Aug 6, 2021 at 12:08 PM Ryan Hendrickson <
[email protected]> wrote:

> Elli replied to this, although it looks like his email got flagged for
> spam, so I'm replying with his comments to make sure it got through:
>
> We are experiencing this issue as well. We just upgraded from Nifi 1.11.4
> to 1.13.2, and are running in to this issue where many of our high-usage
> Nifi instances are just hanging. For example, we have a 7 node cluster that
> has flowfiles stuck in queues and not moving. We noticed that on 3 of those
> nodes, the flowfile content storage was over 50%, and those are the nodes
> that have flowfiles stuck in the queue. The other nodes have nothing on
> them. No new data is flowing in to the cluster at all, and nothing is
> moving on any of the nodes. We see this problem also on non-cluster
> machines; the cluster just makes it more obvious that this archive max
> usage percentage might be the cause.
>
> We have a lot of merge content processors. We realize that there were a
> lot of I/O improvements in the newer version of Nifi - Joe, we suspect
> these efficiencies might be exacerbating the problem:
>
> *NiFi 1.13.1 - [full_list]*
>
>    - [NIFI-7646] - Improve performance of MergeContent / others that read
>    content of many small FlowFiles
>    - [NIFI-8222] - When processing a lot of small FlowFiles, Provenance
>    Repo spends most of its time in lock contention. That can be improved.
>
>
> *NiFi 1.14.0 - [full list]*
>
>    - [NIFI-8633] - Content Repository can be improved to make fewer disks
>    accesses on read.
>       - Mark Payne's notes:
>
> *"For those interested in the actual performance numbers here, I ran a
>       pretty simple flow that generated a lot of tiny JSON messages, and then
>       used ConvertRecord to convert from JSON to Avro. Ran a profiler against 
> it
>       and found that about 50% of the time for ConvertRecord was spent in
>       FileSystemRepository.read(). This is called twice - once when we read 
> the
>       data for inferring schema, a second time when we parse the data. Of the
>       time spent in FileSystemRepository.read(), about 50% of that time was 
> spent
>       in Files.exists(). So this should improve performance of that flow by
>       something like 25%"*
>
>
> We didn't know about the ...archive.backpressure.percentage property - we
> don't see it in the Admin guide
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html. We
> will set this property to a lot higher than 2% above the max usage
> percentage and see how it goes. Now that we think about it, we believe
> we've experienced this problem occasionally before the upgrade, but it has
> become very frequent since the upgrade.
>
>
>
> On Mon, May 3, 2021 at 1:09 PM Shawn Weeks <[email protected]>
> wrote:
>
>> Sorry, I wasn't saying that
>> 'nifi.content.repository.archive.max.usage.percentage' was new I just
>> hadn't managed to get a NiFi instance stuck this way and even the
>> documentation says that if archive is empty and the content repo needs more
>> room it would disable the archive. I'm having trouble find where '
>> nifi.content.repository.archive.backpressure.percentage' is documented.
>>
>> Thanks
>>
>> -----Original Message-----
>> From: Mark Payne <[email protected]>
>> Sent: Monday, May 3, 2021 12:00 PM
>> To: [email protected]
>> Subject: Re: NiFi Get's Stuck Waiting On Non Existent Archive Cleanup
>>
>> Shawn,
>>
>> There are a couple of properties at play. The
>> “nifi.content.repository.archive.max.usage.percentage" property behaves as
>> you have described. But there’s also a second property:
>> nifi.content.repository.archive.backpressure.percentage
>> This controls at what point the Content Repository will actually apply
>> back-pressure in order to avoid filling the disk. This property defaults to
>> 2% more than the the max.usage.percentage. So by default it uses 50% and
>> 52%.
>> You can adjust the backpressure percentage to something much higher like
>> 80%. So then if you reach 50% it would start clearing things out, and if
>> you reach 80% it’ll start applying the brakes. This is here as a safeguard
>> because we’ve had data flows that can produce the data much faster than it
>> could archive/delete the data. This is common for data flows that produce
>> huge numbers of files in the content repository. So that backpressure is
>> there to ensure that the archive has a chance to run.
>>
>> This has always been here, though, ever since the initial open sourcing.
>> Is not something new. It may be the case that in later versions we have
>> been more efficient at creating the data, such that it’s now exceeding the
>> rate that the cleanup can happen, not sure. But adjusting the
>> “nifi.content.repository.archive.max.usage.percentage” property should get
>> you into a better state.
>>
>> Thanks
>> -Mark
>>
>>
>> > From: Shawn Weeks <[email protected]>
>> > Date: Mon, May 3, 2021 at 9:33 AM
>> > Subject: RE: NiFi Get's Stuck Waiting On Non Existent Archive Cleanup
>> > To: [email protected] <[email protected]>
>> >
>> >
>> > Note I have a 2 node cluster which is why it’s sitting at around 900
>> GB. Per node content repo is sitting at 535gb currently and I’m not sure
>> where the rest of the space is. I have 472GB free on each node in the
>> content_repository partition as shown in the Cluster panel.
>> >
>> >
>> >
>> > Thanks
>> >
>> > Shawn Weeks
>> >
>> >
>> >
>> > From: Shawn Weeks
>> > Sent: Monday, May 3, 2021 11:30 AM
>> > To: [email protected]
>> > Subject: NiFi Get's Stuck Waiting On Non Existent Archive Cleanup
>> >
>> >
>> >
>> > I’m not sure if this is specific to clustering or not but using the
>> default configuration with 50% content archiving it is possible to cause
>> NiFi to quit processing any data by simple filling up a queue with 50% of
>> your content_repository storage. In my example my content_repository is 1TB
>> and once a queue get’s to 500gb or so the next processor won’t process any
>> more data. Once this occurs even stopping GenerateFlowFile won’t fix the
>> problem and my CompressContent never does anything. It’s my understanding
>> that “nifi.content.repository.archive.max.usage.percentage” only set’s the
>> max amount of space that archive’s will use and should never prevent new
>> content from being written in the 1.13.2 it appears be functioning as a
>> reserve instead. I haven’t seen this in older versions of NiFi like 1.9.2
>> and I’m not sure when the behavior changed but even the documentation seems
>> to indicate that this should not be happening. For example ‘If the archive
>> is empty and content repository disk usage is above this percentage, then
>> archiving is temporarily disabled.’
>> >
>> >
>> >
>> > <image001.png>
>> >
>> >
>> >
>> > <image002.png>
>> >
>> >
>> >
>> > Thanks
>> >
>> > Shawn Weeks
>> >
>>
>>

Reply via email to