FW: Content repository filling up

Olav Jordens Mon, 24 Apr 2017 18:32:37 -0700

Apologies – forgot to mention that I am on nifi 1.1.2. on Linux RHEL 6.5

Thanks,
Olav





[cid:imagec1ccc2.JPG@9af01642.41ad8e83] Olav Jordens
Senior ETL Developer
+64 226 202 429
+64 9 919 7000
2degreesmobile.co.nz<http://www.2degreesmobile.co.nz>

Two Degrees Mobile Limited | 47-49 George Street | Newmarket | Auckland | New 
Zealand
PO Box 8355 | Symonds Street | Auckland 1150 | New Zealand | Fax +64 9 919 7001

________________________________

Disclaimer
The e-mail and any files transmitted with it are confidential and may contain 
privileged or copyright information. If you are not the intended recipient you 
must not copy, distribute, or use this e-mail or the information contained in 
it for any purpose other than to notify us of the error. If you have received 
this message in error, please notify the sender immediately, by email or phone 
(+64 9 919 7000) and delete this email from your system. Any views expressed in 
this message are those of the individual sender, except where the sender 
specifically states them to be the views of Two Degrees Mobile Limited. We do 
not guarantee that this material is free from viruses or any other defects 
although due care has been taken to minimize the risk


From: Olav Jordens
Sent: Tuesday, 25 April 2017 1:27 p.m.
To: 'users@nifi.apache.org' <users@nifi.apache.org>
Subject: Content repository filling up

Hi Users,

I have had this problem intermittently for some time now – the content 
repository disk fills up even though there appear to be very few flow files in 
the system.
I have read the very good explanation of content claims here: 
https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html

My data flows includes a mix of very large and very small files, and so I 
suspect that the small files within a claim are locking the large ones. I have 
followed the suggestion in the above link:

If you are working with data that ranges greatly from very small to very large, 
you may want to decrease the max appendable size and/or max flow file settings. 
By doing so you decrease the number of FlowFiles that make it into a single 
claim. This in turns reduces the likelihood of a single piece of data keeping 
large amounts of data still active in your content repository.

I have tried the most radical approach – one content claim per file which I 
believe should imply that as soon as a large file leaves the flow, it is 
available for removal as I have set archiving to false.
My issue is that even with these settings, the nifi content repository fills 
up, and when I look inside the content repository, I see multiple flowfile 
contents contained within a single claim file, which is unexpected as I have 
set nifi.content.claim.max.flow.files=1.


These are my content repository settings in nifi.properties:

# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
# Exceptionally important to get this right when having a mix of large and 
small files
# We don't want a large file to be in the same claim as a small file which 
remains queued:
# The claim can never be released until the small file is no longer enqueued 
and has been released
# Large files, first into a claim, will take up an entire claim anyway.
# So setting max.flow.files=1, there is no need to configure max.appendable.size
nifi.content.claim.max.appendable.size=10 MB
#nifi.content.claim.max.flow.files=100
nifi.content.claim.max.flow.files=1

#OPT
#nifi.content.repository.directory.default=./content_repository
nifi.content.repository.directory.default=/app/nifi/common/content_repository

# Archiving of content is disabled - no need to keep data hanging around once 
the flow is complete.
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
#nifi.content.repository.archive.enabled=true
nifi.content.repository.archive.enabled=false
nifi.content.repository.always.sync=false
nifi.content.viewer.url=/nifi-content-viewer/

Am I looking at this incorrectly?

Thanks,
Olav

FW: Content repository filling up

Reply via email to