James,
This sounds like an interesting project. I would recommend
RouteOnAttribute with a "sample" property with value
"${random():mod(1032):equals(100)}" (the second number could be anything
between 0 and 1031), and then routing the "sample" relationship to your
sampling path. I'm not sure I un
Also, I just realized I misread your sampling requirement. You would use
the approach above if you wanted to sample *every 1032th flowfile*, but you
want a sample size of 1032 total. You can still use a randomizing
selection approach as I described (though your mod value would depend on
what freq
If you have large FlowFiles and are trying to sample records from
each, you can use SampleRecord. It has Interval Sampling,
Probabilistic Sampling, and Reservoir Sampling strategies, and I have
a PR [1] up to add Range Sampling [2].
Regards,
Matt
[1] https://github.com/apache/nifi/pull/5878
[2] h
Your use case to draw a random sample of N records as one method of course
depends on the distribution you use. The uniform distribution documented in the
expression language guide is e.g.
${random():mod(10):plus(1)} returns random number between 1 and 10 inclusive.
There are numerous algorithms
These replies have all been very helpful and I wanted to get back to you
and say thanks. We will have both situations to contend with: large numbers
of flowfiles, each representing an atomic record or object, and smaller
numbers of very large flowfiles from which we will draw a sample of
records. I