Great, thanks for the clarification! Bill
> On Mar 23, 2016, at 10:52 AM, Frances Perry <[email protected]> wrote: > > > Perhaps I'm unclear on what an “Aggregator” is. I assumed that a line such as > the following: > > PCollection<KV<String, Double>> meanByName = > dataPoints.apply(Mean.<String, Double>perKey()); > > …would be considered an Aggregator, since it applies a mean aggregation over > a window. Is that correct, with respect to the Beam terminology? If not, what > would an example of an Aggregator be? > > Ah, we may have some slightly confusing terminology here. > > In that code snippet you are using a PTransform (Mean.perKey) to combine a > PCollection using the Mean CombineFn > <https://github.com/apache/incubator-beam/blob/c199f085473cfcd79014d0a022b5ce3fdd4863ec/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Combine.java#L359>. > An Aggregator > <https://github.com/apache/incubator-beam/blob/211e76abf9ba34c35ef13cca279cbeefdad7c406/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Aggregator.java#L54> > takes a CombineFn and applies it continuously within a DoFn. So it's more > analogous to a 'counter'. You can see an example of aggregators in > DebuggingWordCount > <https://github.com/apache/incubator-beam/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java#L129>. > > We never really used the term aggregation to refer to a general set of > PTransforms until we started describing things to the community. But it is a > useful word, so we've ended up in a bit of confusing state. Maybe we should > consider renaming Aggregator? Something like "metric" might be clearer. >
