Mark Payne created NIFI-1008: -------------------------------- Summary: NiFi should swap out FlowFiles to disk even before the session is committed Key: NIFI-1008 URL: https://issues.apache.org/jira/browse/NIFI-1008 Project: Apache NiFi Issue Type: Improvement Components: Core Framework Reporter: Mark Payne
Currently, NiFi will swap out FlowFiles if there are a large number in a FlowFile Queue. This is done to avoid running out of JVM heap space. However, if we have a simple flow like GetFile -> SplitText and GetFile pulls in a large file, SplitText can quickly cause OutOfMemoryError. This is not because it buffers the content of the FlowFile in memory but rather because it holds the millions of FlowFile objects in memory. We can do better. When we call session.transfer for the FlowFiles, once we hit a magical threshold (say 10,000), we should swap those FlowFiles to disk and the session should transfer them to the queue "swapped out" flowfiles, rather than having to buffer all of these in memory and then swapping them out once they land in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)