Huge Just to close the loop on this one, I also wanted to point out this JIRA https://issues.apache.org/jira/browse/NIFI-1926 for general purpose aggregation processor which indeed would support multiple connections, configurable aggregation, release and correlation strategies. It would be nice if you can describe your use case in that JIRA, so we can start gathering these use cases.
Cheers Oleg On Jun 3, 2016, at 2:33 AM, Huagen peng <huagen.p...@gmail.com<mailto:huagen.p...@gmail.com>> wrote: Thanks for the reply, Andy. I ended up abandoning my previous approach and using ExecuteStreamCommand to output (with zcat command on GZ files) all the files I want to concatenate. Then performing some data manipulation and saving the file. Huagen 在 2016年6月3日,上午12:29,Andy LoPresto <alopre...@apache.org<mailto:alopre...@apache.org>> 写道: Huagen, Sorry, I am a little confused. My understanding is that you want to combine n individual logs (each with a respective flowfile) from a specific hour into a single file. What is confusing is when you say “Even with that [a 5* confirmation loop], I occasionally still get more than one merged flowfile.” Do you mean that what you expected to be combined into a single flowfile is output as two distinct and incomplete flowfiles? Without seeing a template of your work flow, I can make a couple of suggestions. First, as mentioned last night by James Wing, I would encourage you to look at the MergeContent [1] processor properties to provide a high threshold for merging flowfiles. If you know the number of log files per hour a priori, you can set that as the “Minimum Number of Entries” and ensure that output will wait until that many flowfiles have been accumulated. Also, given that you have described a “loop”, I would imagine you may have multiple connections feeding into MergeContent. MergeContent can have unexpected behavior with multiple incoming connections, and so I would recommend adding a Funnel to aggregate all incoming connections and provide a single incoming connection to MergeContent. Please let us know if this helps, and if not, please share a template and some sample input if possible. Thanks. [1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/index.html Andy LoPresto alopre...@apache.org<mailto:alopre...@apache.org> alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 On Jun 1, 2016, at 11:52 AM, Huagen peng <huagen.p...@gmail.com<mailto:huagen.p...@gmail.com>> wrote: Hi, In the data flow I am dealing with now, there are multiple (up to 200) logs associated with a given hour. I need to process these fragment hourly logs and then concatenate them into a single file. The approach I am using now has an UpdateAttribute processor to set an arbitrary segment.original.filename attribute on all the flowfiles I want to merge. Then I use a MergeContent processor, with an UpdateAttribute and RouteOnAttribute processor to form a loop to confirm five times that the merge is complete. Even with that, I occasionally still get more than one merged flowfile. Is there a better way to do this? Or should I increase the loop count, say 10? Thanks. Huagen