Apologies – forgot to mention that I am on nifi 1.1.2. on Linux RHEL 6.5 Thanks, Olav
[cid:imagec1ccc2.JPG@9af01642.41ad8e83] Olav Jordens Senior ETL Developer +64 226 202 429 +64 9 919 7000 2degreesmobile.co.nz<http://www.2degreesmobile.co.nz> Two Degrees Mobile Limited | 47-49 George Street | Newmarket | Auckland | New Zealand PO Box 8355 | Symonds Street | Auckland 1150 | New Zealand | Fax +64 9 919 7001 ________________________________ Disclaimer The e-mail and any files transmitted with it are confidential and may contain privileged or copyright information. If you are not the intended recipient you must not copy, distribute, or use this e-mail or the information contained in it for any purpose other than to notify us of the error. If you have received this message in error, please notify the sender immediately, by email or phone (+64 9 919 7000) and delete this email from your system. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Two Degrees Mobile Limited. We do not guarantee that this material is free from viruses or any other defects although due care has been taken to minimize the risk From: Olav Jordens Sent: Tuesday, 25 April 2017 1:27 p.m. To: 'users@nifi.apache.org' <users@nifi.apache.org> Subject: Content repository filling up Hi Users, I have had this problem intermittently for some time now – the content repository disk fills up even though there appear to be very few flow files in the system. I have read the very good explanation of content claims here: https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html My data flows includes a mix of very large and very small files, and so I suspect that the small files within a claim are locking the large ones. I have followed the suggestion in the above link: If you are working with data that ranges greatly from very small to very large, you may want to decrease the max appendable size and/or max flow file settings. By doing so you decrease the number of FlowFiles that make it into a single claim. This in turns reduces the likelihood of a single piece of data keeping large amounts of data still active in your content repository. I have tried the most radical approach – one content claim per file which I believe should imply that as soon as a large file leaves the flow, it is available for removal as I have set archiving to false. My issue is that even with these settings, the nifi content repository fills up, and when I look inside the content repository, I see multiple flowfile contents contained within a single claim file, which is unexpected as I have set nifi.content.claim.max.flow.files=1. These are my content repository settings in nifi.properties: # Content Repository nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository # Exceptionally important to get this right when having a mix of large and small files # We don't want a large file to be in the same claim as a small file which remains queued: # The claim can never be released until the small file is no longer enqueued and has been released # Large files, first into a claim, will take up an entire claim anyway. # So setting max.flow.files=1, there is no need to configure max.appendable.size nifi.content.claim.max.appendable.size=10 MB #nifi.content.claim.max.flow.files=100 nifi.content.claim.max.flow.files=1 #OPT #nifi.content.repository.directory.default=./content_repository nifi.content.repository.directory.default=/app/nifi/common/content_repository # Archiving of content is disabled - no need to keep data hanging around once the flow is complete. nifi.content.repository.archive.max.retention.period=12 hours nifi.content.repository.archive.max.usage.percentage=50% #nifi.content.repository.archive.enabled=true nifi.content.repository.archive.enabled=false nifi.content.repository.always.sync=false nifi.content.viewer.url=/nifi-content-viewer/ Am I looking at this incorrectly? Thanks, Olav