@Joe I can't export the flow.xml.gz easily, although it's pretty simple.
We put just the following on it's own server because DistributeLoad (bug
[1]) and PutElasticsearchHttp have a hard time keeping up.

   1. Input Port
   2. ControlRate (data rate | 1.7GB | 5 min)
   3. Update Attributes (Delete Attribute Regex)
   4. JoltTransformJSON
   5. FlattenJSONArray (Custom.. takes a 1 level JSON Array and turns it
   into Objects)
   6. DistributeLoad
      1. PutElasticsearchHttp
      2. PutElasticsearchHttp


Unrelated..  We're experimenting with a MergeContent + InvokeHTTP combo to
see if that's more performant than PutElasticsearchHttp.. The Elastic one
uses an ObjectMapper, and string replacements, etc.  It seems to cap out
around 2-3GB/5 minutes

@Mark I'll check the diagnostics.

@Jim definitely disk space 100% used.

[1] https://issues.apache.org/jira/browse/NIFI-1121

Ryan

On Thu, Sep 17, 2020 at 11:33 AM Williams, Jim <jwilli...@alertlogic.com>
wrote:

> Ryan,
>
>
>
> Is this this maybe a case of exhausting inodes on the filesystem rather
> than exhausting the space available?  If you do a ‘df -I’ on the system
> what do you see for inode usage?
>
>
>
> Warm regards,
>
>
>
> <https://www.alertlogic.com/>
>
> *Jim Williams* | Manager, Site Reliability Engineering
>
> O: +1 713.341.7812 | C: +1 919.523.8767 | jwilli...@alertlogic.com |
> alertlogic.com <http://www.alertlogic.com/>
> <https://twitter.com/alertlogic>
> <https://www.linkedin.com/company/alert-logic>
>
>
>
>
>
> *From:* Joe Witt <joe.w...@gmail.com>
> *Sent:* Thursday, September 17, 2020 10:19 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: Content Claims Filling Disk - Best practice for small
> files?
>
>
>
> can you share your flow.xml.gz?
>
>
>
> On Thu, Sep 17, 2020 at 8:08 AM Ryan Hendrickson <
> ryan.andrew.hendrick...@gmail.com> wrote:
>
> 1.12.0
>
>
>
> Thanks,
>
> Ryan
>
>
>
> On Thu, Sep 17, 2020 at 11:04 AM Joe Witt <joe.w...@gmail.com> wrote:
>
> Ryan
>
>
>
> What version are you using? I do think we had an issue that kept items
> around longer than intended that has been addressed.
>
>
>
> Thanks
>
>
>
> On Thu, Sep 17, 2020 at 7:58 AM Ryan Hendrickson <
> ryan.andrew.hendrick...@gmail.com> wrote:
>
> Hello,
>
> I've got ~15 million FlowFiles, each roughly 4KB, totally in about 55GB of
> data on my canvas.
>
>
>
> However, the content repository (on it's own partition) is completely full
> with 350GB of data.  I'm pretty certain the way Content Claims store the
> data is responsible for this.  In previous experience, we've had files that
> are larger, and haven't seen this as much.
>
>
>
> My guess is that as data was streaming through and being added to a claim,
> it isn't always released as the small files leaves the canvas.
>
>
>
> We've run into this issue enough times that I figure there's probably a
> "best practice for small files" for the content claims settings.
>
>
>
> These are our current settings:
>
>
> nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
>
> nifi.content.claim.max.appendable.size=1 MB
>
> nifi.content.claim.max.flow.files=100
>
> nifi.content.repository.directory.default=/var/nifi/repositories/content
>
> nifi.content.repository.archive.max.retention.period=12 hours
>
> nifi.content.repository.archive.max.usage.percentage=50%
>
> nifi.content.repository.archive.enabled=true
>
> nifi.content.repository.always.sync=false
>
>
>
>
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository
>
>
>
>
> There's 1024 folders on the disk (0-1023) for the Content Claims.
>
> Each file inside the folders are roughly  2MB to 8 MB (Which is odd
> because I thought the max appendable size would make this no larger than
> 1MB.)
>
>
>
> Is there a way to expand the number of folders and/or reduce the amount of
> individual FlowFiles that are stored in the claims?
>
>
>
> I'm hoping there might be a best practice out there though.
>
>
>
> Thanks,
>
> Ryan
>
>
>
> Confidentiality Notice | This email and any included attachments may be
> privileged, confidential and/or otherwise protected from disclosure. Access
> to this email by anyone other than the intended recipient is unauthorized.
> If you believe you have received this email in error, please contact the
> sender immediately and delete all copies. If you are not the intended
> recipient, you are notified that disclosing, copying, distributing or
> taking any action in reliance on the contents of this information is
> strictly prohibited.
>
> *Disclaimer*
>
> The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast, a leader in email security and cyber
> resilience. Mimecast integrates email defenses with brand protection,
> security awareness training, web security, compliance and other essential
> capabilities. Mimecast helps protect large and small organizations from
> malicious activity, human error and technology failure; and to lead the
> movement toward building a more resilient world. To find out more, visit
> our website.
>

Reply via email to