Correction - it did work. I was expecting it to be in the same folder as where I ran nifi.sh from, vs NIFI_HOME.
Reviewing it now... Ryan On Thu, Sep 17, 2020 at 1:51 PM Ryan Hendrickson < ryan.andrew.hendrick...@gmail.com> wrote: > Hey Mark, > I should have mentioned the PutElasticsearchHttp is going to 2 different > clusters. We did play with different thread counts for each of them. At > one point were wondering if too large a Batch Size would make the threads > block each. > > It looks like PutElasticsearchHttp serializes every FlowFile to verify > it's a well-formed JSON document [1]. That alone feels pretty CPU > expensive.. In our case, we know already we have valid JSON. Just as an > anecdotal benchmark.. A combination of [MergeContent + 2x InvokeHTTP] uses > a total of 9 threads to accomplish the same thing that [2x DistributeLoad + > 2x PutElasticsearchHTTP] does with 50 threads. DistributeLoad's need 5 > threads each to keep up. PutElasticsearchHTTP needs about 10 each. > > PutElasticsearchHTTP is configured like this: > Index: ${esIndex} > Batch Size: 3000 > Index Operation: Index > > For the ./nifi.sh diagnostics --verbose diagnostics1.txt, I had to export > TOOLS_JAR on the command line to the path where tools.jar was located. > > I'm not getting a file written out though. I still have the "full" NiFi > up and running. I assume that should be? Do I need to change my > logback.xml levels at all? > > > [1] > https://github.com/apache/nifi/blob/aa741cc5967f62c3c38c2a47e712b7faa6fe19ff/nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearchHttp.java#L299 > > Thanks, > Ryan > > On Thu, Sep 17, 2020 at 11:43 AM Mark Payne <marka...@hotmail.com> wrote: > >> Ryan, >> >> Why are you using DistributeLoad to go to two different >> PutElasticsearchHttp processors? Does that perform better for you than a >> single PutElasticsearchHttp processors with multiple concurrent tasks? It >> shouldn’t really. I’ve never used that processor, but if two instances of >> the processor perform significantly better than 1 instance with 2 >> concurrent tasks, that’s probably worth looking into. >> >> -Mark >> >> >> On Sep 17, 2020, at 11:38 AM, Ryan Hendrickson < >> ryan.andrew.hendrick...@gmail.com> wrote: >> >> @Joe I can't export the flow.xml.gz easily, although it's pretty simple. >> We put just the following on it's own server because DistributeLoad (bug >> [1]) and PutElasticsearchHttp have a hard time keeping up. >> >> 1. Input Port >> 2. ControlRate (data rate | 1.7GB | 5 min) >> 3. Update Attributes (Delete Attribute Regex) >> 4. JoltTransformJSON >> 5. FlattenJSONArray (Custom.. takes a 1 level JSON Array and turns it >> into Objects) >> 6. DistributeLoad >> 1. PutElasticsearchHttp >> 2. PutElasticsearchHttp >> >> >> Unrelated.. We're experimenting with a MergeContent + InvokeHTTP combo >> to see if that's more performant than PutElasticsearchHttp.. The Elastic >> one uses an ObjectMapper, and string replacements, etc. It seems to cap >> out around 2-3GB/5 minutes >> >> @Mark I'll check the diagnostics. >> >> @Jim definitely disk space 100% used. >> >> [1] https://issues.apache.org/jira/browse/NIFI-1121 >> >> Ryan >> >> On Thu, Sep 17, 2020 at 11:33 AM Williams, Jim <jwilli...@alertlogic.com> >> wrote: >> >>> Ryan, >>> >>> >>> >>> Is this this maybe a case of exhausting inodes on the filesystem rather >>> than exhausting the space available? If you do a ‘df -I’ on the system >>> what do you see for inode usage? >>> >>> >>> >>> Warm regards, >>> >>> >>> >>> <image001.jpg> <https://www.alertlogic.com/> >>> >>> *Jim Williams* | Manager, Site Reliability Engineering >>> >>> O: +1 713.341.7812 | C: +1 919.523.8767 | jwilli...@alertlogic.com | >>> alertlogic.com <http://www.alertlogic.com/> <image002.png> >>> <https://twitter.com/alertlogic><image003.png> >>> <https://www.linkedin.com/company/alert-logic> >>> >>> >>> >>> <image004.png> >>> >>> >>> >>> *From:* Joe Witt <joe.w...@gmail.com> >>> *Sent:* Thursday, September 17, 2020 10:19 AM >>> *To:* users@nifi.apache.org >>> *Subject:* Re: Content Claims Filling Disk - Best practice for small >>> files? >>> >>> >>> >>> can you share your flow.xml.gz? >>> >>> >>> >>> On Thu, Sep 17, 2020 at 8:08 AM Ryan Hendrickson < >>> ryan.andrew.hendrick...@gmail.com> wrote: >>> >>> 1.12.0 >>> >>> >>> >>> Thanks, >>> >>> Ryan >>> >>> >>> >>> On Thu, Sep 17, 2020 at 11:04 AM Joe Witt <joe.w...@gmail.com> wrote: >>> >>> Ryan >>> >>> >>> >>> What version are you using? I do think we had an issue that kept items >>> around longer than intended that has been addressed. >>> >>> >>> >>> Thanks >>> >>> >>> >>> On Thu, Sep 17, 2020 at 7:58 AM Ryan Hendrickson < >>> ryan.andrew.hendrick...@gmail.com> wrote: >>> >>> Hello, >>> >>> I've got ~15 million FlowFiles, each roughly 4KB, totally in about 55GB >>> of data on my canvas. >>> >>> >>> >>> However, the content repository (on it's own partition) is >>> completely full with 350GB of data. I'm pretty certain the way Content >>> Claims store the data is responsible for this. In previous experience, >>> we've had files that are larger, and haven't seen this as much. >>> >>> >>> >>> My guess is that as data was streaming through and being added to a >>> claim, it isn't always released as the small files leaves the canvas. >>> >>> >>> >>> We've run into this issue enough times that I figure there's probably a >>> "best practice for small files" for the content claims settings. >>> >>> >>> >>> These are our current settings: >>> >>> >>> nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository >>> >>> nifi.content.claim.max.appendable.size=1 MB >>> >>> nifi.content.claim.max.flow.files=100 >>> >>> nifi.content.repository.directory.default=/var/nifi/repositories/content >>> >>> nifi.content.repository.archive.max.retention.period=12 hours >>> >>> nifi.content.repository.archive.max.usage.percentage=50% >>> >>> nifi.content.repository.archive.enabled=true >>> >>> nifi.content.repository.always.sync=false >>> >>> >>> >>> >>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository >>> >>> >>> >>> >>> There's 1024 folders on the disk (0-1023) for the Content Claims. >>> >>> Each file inside the folders are roughly 2MB to 8 MB (Which is odd >>> because I thought the max appendable size would make this no larger than >>> 1MB.) >>> >>> >>> >>> Is there a way to expand the number of folders and/or reduce the amount >>> of individual FlowFiles that are stored in the claims? >>> >>> >>> >>> I'm hoping there might be a best practice out there though. >>> >>> >>> >>> Thanks, >>> >>> Ryan >>> >>> >>> >>> Confidentiality Notice | This email and any included attachments may be >>> privileged, confidential and/or otherwise protected from disclosure. Access >>> to this email by anyone other than the intended recipient is unauthorized. >>> If you believe you have received this email in error, please contact the >>> sender immediately and delete all copies. If you are not the intended >>> recipient, you are notified that disclosing, copying, distributing or >>> taking any action in reliance on the contents of this information is >>> strictly prohibited. >>> >>> *Disclaimer* >>> >>> The information contained in this communication from the sender is >>> confidential. It is intended solely for use by the recipient and others >>> authorized to receive it. If you are not the recipient, you are hereby >>> notified that any disclosure, copying, distribution or taking action in >>> relation of the contents of this information is strictly prohibited and may >>> be unlawful. >>> >>> This email has been scanned for viruses and malware, and may have been >>> automatically archived by Mimecast, a leader in email security and cyber >>> resilience. Mimecast integrates email defenses with brand protection, >>> security awareness training, web security, compliance and other essential >>> capabilities. Mimecast helps protect large and small organizations from >>> malicious activity, human error and technology failure; and to lead the >>> movement toward building a more resilient world. To find out more, visit >>> our website. >>> >> >>