Mark Payne created NIFI-1008:
--------------------------------

             Summary: NiFi should swap out FlowFiles to disk even before the 
session is committed
                 Key: NIFI-1008
                 URL: https://issues.apache.org/jira/browse/NIFI-1008
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Mark Payne


Currently, NiFi will swap out FlowFiles if there are a large number in a 
FlowFile Queue. This is done to avoid running out of JVM heap space. However, 
if we have a simple flow like GetFile -> SplitText and GetFile pulls in a large 
file, SplitText can quickly cause OutOfMemoryError. This is not because it 
buffers the content of the FlowFile in memory but rather because it holds the 
millions of FlowFile objects in memory. We can do better.

When we call session.transfer for the FlowFiles, once we hit a magical 
threshold (say 10,000), we should swap those FlowFiles to disk and the session 
should transfer them to the queue "swapped out" flowfiles, rather than having 
to buffer all of these in memory and then swapping them out once they land in 
the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to