Huge

Just to close the loop on this one, I also wanted to point out this JIRA 
https://issues.apache.org/jira/browse/NIFI-1926 for general purpose aggregation 
processor which indeed would support multiple connections, configurable 
aggregation, release and correlation strategies.
It would be nice if you can describe your use case in that JIRA, so we can 
start gathering these use cases.

Cheers
Oleg

On Jun 3, 2016, at 2:33 AM, Huagen peng 
<huagen.p...@gmail.com<mailto:huagen.p...@gmail.com>> wrote:

Thanks for the reply, Andy.

I ended up abandoning my previous approach and using ExecuteStreamCommand to 
output (with zcat command on GZ files) all the files I want to concatenate.  
Then performing some data manipulation and saving the file.

Huagen

在 2016年6月3日,上午12:29,Andy LoPresto 
<alopre...@apache.org<mailto:alopre...@apache.org>> 写道:

Huagen,

Sorry, I am a little confused. My understanding is that you want to combine n 
individual logs (each with a respective flowfile) from a specific hour into a 
single file. What is confusing is when you say “Even with that [a 5* 
confirmation loop], I occasionally still get more than one merged flowfile.” Do 
you mean that what you expected to be combined into a single flowfile is output 
as two distinct and incomplete flowfiles?

Without seeing a template of your work flow, I can make a couple of suggestions.

First, as mentioned last night by James Wing, I would encourage you to look at 
the MergeContent [1] processor properties to provide a high threshold for 
merging flowfiles. If you know the number of log files per hour a priori, you 
can set that as the “Minimum Number of Entries” and ensure that output will 
wait until that many flowfiles have been accumulated.

Also, given that you have described a “loop”, I would imagine you may have 
multiple connections feeding into MergeContent. MergeContent can have 
unexpected behavior with multiple incoming connections, and so I would 
recommend adding a Funnel to aggregate all incoming connections and provide a 
single incoming connection to MergeContent.

Please let us know if this helps, and if not, please share a template and some 
sample input if possible. Thanks.

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/index.html


Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Jun 1, 2016, at 11:52 AM, Huagen peng 
<huagen.p...@gmail.com<mailto:huagen.p...@gmail.com>> wrote:

Hi,

In the data flow I am dealing with now, there are multiple (up to 200) logs 
associated with a given hour.  I need to process these fragment hourly logs and 
then concatenate them into a single file.  The approach I am using now has an 
UpdateAttribute processor to set an arbitrary segment.original.filename 
attribute on all the flowfiles I want to merge.  Then I use a MergeContent 
processor, with an UpdateAttribute and RouteOnAttribute processor to form a 
loop to confirm five times that the merge is complete.  Even with that, I 
occasionally still get more than one merged flowfile.

Is there a better way to do this?  Or should I increase the loop count, say 10?

Thanks.

Huagen



Reply via email to