Ryan, Thanks. So 1.12.0 has no known issues with content repo not being cleaned up properly.
As you pointed out, nifi.content.claim.max.appendable.size is intended to cap the maximum number of FlowFiles that will be written to a single file. However, it does come with a couple of caveats. (1) Once this cap is reached, it won’t keep adding FlowFiles to the stream, but once it starts, it doesn’t spill over to another stream. So, with that set to 1 MB, you may write 100 FlowFiles, each 4 KB, and then write a 4 MB FlowFile to it. So the size will be about 4.4 MB, and it won’t be cleaned up until all 101 FlowFiles have left your system. (2) The cap only takes effect between Process Sessions. Meaning, that if you have a Processor that processes many FlowFiles in a single session, they can all be written to a single file. Generally, this could happen if you set the Run Duration to a high value. For example, if Run Duration is set to 1 second and you have enough FlowFiles for it to process for a full second, all of those FlowFiles could be written to the same file on disk. Also, of note, the files are only cleaned up when the FlowFile Repository checkpoints. This is determined by the “nifi.flowfile.repository.checkpoint.interval” property. This defaults to 20 seconds in 1.12.0 but if you have a larger value there, you may want to decrease it. One thing that might be of interest in understanding why the content claims still exist in the repo is to run “bin/nifi.sh diagnostics —verbose diagnostics1.txt” That will write out a file, diagnostics1.txt, that has lots of diagnostics information. This includes which FlowFiles are referencing each file in the content repository. I.e., which FlowFiles must finish processing before the file can be cleaned up. Hope this helps! -Mark On Sep 17, 2020, at 11:07 AM, Ryan Hendrickson <ryan.andrew.hendrick...@gmail.com<mailto:ryan.andrew.hendrick...@gmail.com>> wrote: 1.12.0 Thanks, Ryan On Thu, Sep 17, 2020 at 11:04 AM Joe Witt <joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote: Ryan What version are you using? I do think we had an issue that kept items around longer than intended that has been addressed. Thanks On Thu, Sep 17, 2020 at 7:58 AM Ryan Hendrickson <ryan.andrew.hendrick...@gmail.com<mailto:ryan.andrew.hendrick...@gmail.com>> wrote: Hello, I've got ~15 million FlowFiles, each roughly 4KB, totally in about 55GB of data on my canvas. However, the content repository (on it's own partition) is completely full with 350GB of data. I'm pretty certain the way Content Claims store the data is responsible for this. In previous experience, we've had files that are larger, and haven't seen this as much. My guess is that as data was streaming through and being added to a claim, it isn't always released as the small files leaves the canvas. We've run into this issue enough times that I figure there's probably a "best practice for small files" for the content claims settings. These are our current settings: nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository nifi.content.claim.max.appendable.size=1 MB nifi.content.claim.max.flow.files=100 nifi.content.repository.directory.default=/var/nifi/repositories/content nifi.content.repository.archive.max.retention.period=12 hours nifi.content.repository.archive.max.usage.percentage=50% nifi.content.repository.archive.enabled=true nifi.content.repository.always.sync=false https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository There's 1024 folders on the disk (0-1023) for the Content Claims. Each file inside the folders are roughly 2MB to 8 MB (Which is odd because I thought the max appendable size would make this no larger than 1MB.) Is there a way to expand the number of folders and/or reduce the amount of individual FlowFiles that are stored in the claims? I'm hoping there might be a best practice out there though. Thanks, Ryan