@Joe I can't export the flow.xml.gz easily, although it's pretty simple. We put just the following on it's own server because DistributeLoad (bug [1]) and PutElasticsearchHttp have a hard time keeping up.
1. Input Port 2. ControlRate (data rate | 1.7GB | 5 min) 3. Update Attributes (Delete Attribute Regex) 4. JoltTransformJSON 5. FlattenJSONArray (Custom.. takes a 1 level JSON Array and turns it into Objects) 6. DistributeLoad 1. PutElasticsearchHttp 2. PutElasticsearchHttp Unrelated.. We're experimenting with a MergeContent + InvokeHTTP combo to see if that's more performant than PutElasticsearchHttp.. The Elastic one uses an ObjectMapper, and string replacements, etc. It seems to cap out around 2-3GB/5 minutes @Mark I'll check the diagnostics. @Jim definitely disk space 100% used. [1] https://issues.apache.org/jira/browse/NIFI-1121 Ryan On Thu, Sep 17, 2020 at 11:33 AM Williams, Jim <jwilli...@alertlogic.com> wrote: > Ryan, > > > > Is this this maybe a case of exhausting inodes on the filesystem rather > than exhausting the space available? If you do a ‘df -I’ on the system > what do you see for inode usage? > > > > Warm regards, > > > > <https://www.alertlogic.com/> > > *Jim Williams* | Manager, Site Reliability Engineering > > O: +1 713.341.7812 | C: +1 919.523.8767 | jwilli...@alertlogic.com | > alertlogic.com <http://www.alertlogic.com/> > <https://twitter.com/alertlogic> > <https://www.linkedin.com/company/alert-logic> > > > > > > *From:* Joe Witt <joe.w...@gmail.com> > *Sent:* Thursday, September 17, 2020 10:19 AM > *To:* users@nifi.apache.org > *Subject:* Re: Content Claims Filling Disk - Best practice for small > files? > > > > can you share your flow.xml.gz? > > > > On Thu, Sep 17, 2020 at 8:08 AM Ryan Hendrickson < > ryan.andrew.hendrick...@gmail.com> wrote: > > 1.12.0 > > > > Thanks, > > Ryan > > > > On Thu, Sep 17, 2020 at 11:04 AM Joe Witt <joe.w...@gmail.com> wrote: > > Ryan > > > > What version are you using? I do think we had an issue that kept items > around longer than intended that has been addressed. > > > > Thanks > > > > On Thu, Sep 17, 2020 at 7:58 AM Ryan Hendrickson < > ryan.andrew.hendrick...@gmail.com> wrote: > > Hello, > > I've got ~15 million FlowFiles, each roughly 4KB, totally in about 55GB of > data on my canvas. > > > > However, the content repository (on it's own partition) is completely full > with 350GB of data. I'm pretty certain the way Content Claims store the > data is responsible for this. In previous experience, we've had files that > are larger, and haven't seen this as much. > > > > My guess is that as data was streaming through and being added to a claim, > it isn't always released as the small files leaves the canvas. > > > > We've run into this issue enough times that I figure there's probably a > "best practice for small files" for the content claims settings. > > > > These are our current settings: > > > nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository > > nifi.content.claim.max.appendable.size=1 MB > > nifi.content.claim.max.flow.files=100 > > nifi.content.repository.directory.default=/var/nifi/repositories/content > > nifi.content.repository.archive.max.retention.period=12 hours > > nifi.content.repository.archive.max.usage.percentage=50% > > nifi.content.repository.archive.enabled=true > > nifi.content.repository.always.sync=false > > > > > https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository > > > > > There's 1024 folders on the disk (0-1023) for the Content Claims. > > Each file inside the folders are roughly 2MB to 8 MB (Which is odd > because I thought the max appendable size would make this no larger than > 1MB.) > > > > Is there a way to expand the number of folders and/or reduce the amount of > individual FlowFiles that are stored in the claims? > > > > I'm hoping there might be a best practice out there though. > > > > Thanks, > > Ryan > > > > Confidentiality Notice | This email and any included attachments may be > privileged, confidential and/or otherwise protected from disclosure. Access > to this email by anyone other than the intended recipient is unauthorized. > If you believe you have received this email in error, please contact the > sender immediately and delete all copies. If you are not the intended > recipient, you are notified that disclosing, copying, distributing or > taking any action in reliance on the contents of this information is > strictly prohibited. > > *Disclaimer* > > The information contained in this communication from the sender is > confidential. It is intended solely for use by the recipient and others > authorized to receive it. If you are not the recipient, you are hereby > notified that any disclosure, copying, distribution or taking action in > relation of the contents of this information is strictly prohibited and may > be unlawful. > > This email has been scanned for viruses and malware, and may have been > automatically archived by Mimecast, a leader in email security and cyber > resilience. Mimecast integrates email defenses with brand protection, > security awareness training, web security, compliance and other essential > capabilities. Mimecast helps protect large and small organizations from > malicious activity, human error and technology failure; and to lead the > movement toward building a more resilient world. To find out more, visit > our website. >