James,

This sounds like an interesting project.  I would recommend
RouteOnAttribute with a "sample" property with value
"${random():mod(1032):equals(100)}" (the second number could be anything
between 0 and 1031), and then routing the "sample" relationship to your
sampling path.  I'm not sure I understand the stratified date aspect, but
you may be able to do some routing there as well if only certain dates
should be sampled.

Hope this helps,
Joe

On Thu, May 19, 2022 at 6:20 AM James McMahon <jsmcmah...@gmail.com> wrote:

> I have been tasked to draw samples from very large raw data sets for
> triage analysis. I am to provide multiple sampling methods. Drawing a
> random sample of N records is one method. A second method is to draw a
> fixed sample of 1,032 records from stratified defined date boundaries in a
> set. The latter is of interest because raw data can substantially change
> structure or even format at points in time, and we need to be able to
> sample within those data boundaries.
>
> Can anyone offer a link to an example of how nifi may be used to draw
> samples randomly and/or in a systematic way from raw data collections?
>

Reply via email to