Michael Moser created NIFI-3376:
-----------------------------------

             Summary: Implement content repository ResourceClaim compaction
                 Key: NIFI-3376
                 URL: https://issues.apache.org/jira/browse/NIFI-3376
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
    Affects Versions: 1.1.1, 0.7.1
            Reporter: Michael Moser


On NiFi systems that deal with many files whose size is less than 1 MB, we 
often see that the actual disk usage of the content_repository is much greater 
than the size of flowfiles that NiFi reports are in its queues.  As an example, 
NiFi may report "50,000 / 12.5 GB" but the content_repository takes up 240 GB 
of its file system.  This leads to scenarios where a 500 GB content_repository 
file system gets 100% full, but "I only had 40 GB of data in my NiFi!"

When several content claims exist in a single resource claim, and most but not 
all content claims are terminated, the entire resource claim is still not 
eligible for deletion or archive.  This could mean that only one 10 KB content 
claim out of a 1 MB resource claim is counted by NiFi as existing in its queues.

If a particular flow has a slow egress point where flowfiles could back up and 
remain on the system longer than expected, this problem is exacerbated.

A potential solution is to compact resource claim files on disk. A background 
thread could examine all resource claims, and for those that get "old" and 
whose active content claim usage drops below a threshold, then rewrite the 
resource claim file.

A potential work-around is to allow modification of the FileSystemRepository 
MAX_APPENDABLE_CLAIM_LENGTH to make it a smaller number.  This would increase 
the probability that the content claims reference count in a resource claim 
would reach 0 and the resource claim becomes eligible for deletion/archive.  
Let users trade-off performance for more accurate accounting of NiFi queue size 
to content repository size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to