Igor,

MergeContent will consider a 'bin' full when any one of those conditions hit. 
I.e., if you set:

Max Group Size = 64 MB
Max Number of Entries = 100
Max Bin Age = 5 mins

Then you will get a merged bin whenever a bin hits 64 MB, regardless of how 
long its been or how many entires there are.
Similarly, if you have 100 entries, then you'll get a bin even if the data is 
only 1 MB total.
Also, if you go 5 minutes without reaching either of those thresholds, the 5 
minute threshold will cause the bin to be created,
regardless of how many FlowFiles there are.

A common pattern for sending to HDFS is to set the Maximum Bin Age to some 
threshold (5 mins or 1 hour or whatever makes
sense for you) and the Min Group Size to 64 MB and Max Group Size to 128 MB and 
not set anything for the Maximum Number
of Entries. In this case, you will get bins of 64 - 128 MB most of the time, 
but if the data volume is low for a while, you'll still get some
data flowing into HDFS because the of the Max Bin Age.

Thanks
-Markk

> On May 31, 2016, at 12:07 PM, Igor Kravzov <igork.ine...@gmail.com> wrote:
> 
> There are 2 configuration properties: Maximum Group Size and Maximum Number 
> of entries.
> Are these mutually exclusive? I want to create a file to store in HDFS but 
> limit size at 64MB as HDFS block (or should I go bigger?).
> 
> Max Bin Age property
> Since content can be in different length and and not know when max size will 
> be reached, whar role it will play?

Reply via email to