David, Makes sense. Large values should not be added as attributes. Attributes are designed for String values. Think 100-200 characters, generally. A couple KB can be fine, as well, but it can significantly reduce performance. If the intent is to “stash the content” so that you can change it and perform enrichment, you should take a look at ForkEnrichment / JoinEnrichment processors.
Thanks -Mark On Mar 11, 2024, at 2:05 PM, David Early <david.ea...@grokstream.com> wrote: Mark, Yes, it was the flowfile repository. Of all your points, the large attributes is most likely our issue. One of our folks was caching the flowfile (which can be large occasionally) in an attribute ahead of a DB lookup (which would overwrite the contents) and then reinstating the content after merging with the DB lookup. The attribute was not removed after the merge. We have added a couple of items to remove the attribute this morning, but the mere presence of it briefly may be enough to cause the spikes. I have since attached a very large disk and I can see the occasionally spikes: <image.png> At 22% on a 512G disk, that is over 110G. What isn't clear is why it is not consistently spiking. We have made some changes to the how long the attribute lives and will monitor over the next couple of days, but likely we will need to cache the contents somewhere and retrieve them later unless someone knows of a better solution here. Thanks for the guidance Dave On Fri, Mar 8, 2024 at 7:05 AM Mark Payne <marka...@hotmail.com<mailto:marka...@hotmail.com>> wrote: Dave, When you say that the journal files are huge, I presume you mean the FlowFile repository? There are generally 4 things that can cause this: - OutOfMemoryError causing the FlowFile repo not to properly checkpoint - Out of Disk Space causing the FlowFile repo not to properly checkpoint - Out of open file handles causing the FlowFile repo not to properly checkpoint - Creating a lot of huge attributes on your FlowFiles. The first 3 situations can be identified by looking for errors in the logs. For the third one, you need to understand whether or not you’re creating huge FlowFile attributes. Generally, attributes should be very small - 100-200 characters or less, ideally. It’s possible that you have a flow that creates huge attributes but the flow is only running on the Primary Node, and Node 2 is your Primary Node, which would cause this to occur only on this node. Thanks -Mark > On Mar 7, 2024, at 9:24 PM, David Early via users > <users@nifi.apache.org<mailto:users@nifi.apache.org>> wrote: > > I have a massive issue: I have a 2 node cluster (using 5 external zookeepers > on other boxes), and for some reason on node 2 I have MASSIVE journal files. > > I am round robbining data between the nodes, but for some reason node 2 just > fills up. This is the second time this has happened this week. > > What should I do? nifi.properties are the same on both systems (except for > local host names).. > > Any ideas of what might be causing one node to overload? > > Dave > > -- David Early, Ph.D. david.ea...@grokstream.com<mailto:david.ea...@grokstream.com> 720-470-7460 Cell [https://ci3.googleusercontent.com/mail-sig/AIorK4ytFrueqWyKLKu2TrMCXdoDWTMEnQxLcsSDLlHSBOyzXbaaJq-i2giAs6TarzTUtUl8iUVecLU]