Great, thanks for the clarification!

Bill

> On Mar 23, 2016, at 10:52 AM, Frances Perry <[email protected]> wrote:
> 
> 
> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a line such as 
> the following:
> 
>       PCollection<KV<String, Double>> meanByName = 
> dataPoints.apply(Mean.<String, Double>perKey());
> 
> …would be considered an Aggregator, since it applies a mean aggregation over 
> a window. Is that correct, with respect to the Beam terminology? If not, what 
> would an example of an Aggregator be?
> 
> Ah, we may have some slightly confusing terminology here. 
> 
> In that code snippet you are using a PTransform (Mean.perKey) to combine a 
> PCollection using the Mean CombineFn 
> <https://github.com/apache/incubator-beam/blob/c199f085473cfcd79014d0a022b5ce3fdd4863ec/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Combine.java#L359>.
>  An Aggregator 
> <https://github.com/apache/incubator-beam/blob/211e76abf9ba34c35ef13cca279cbeefdad7c406/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Aggregator.java#L54>
>  takes a CombineFn and applies it continuously within a DoFn. So it's more 
> analogous to a 'counter'. You can see an example of aggregators in 
> DebuggingWordCount 
> <https://github.com/apache/incubator-beam/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java#L129>.
> 
> We never really used the term aggregation to refer to a general set of 
> PTransforms until we started describing things to the community. But it is a 
> useful word, so we've ended up in a bit of confusing state. Maybe we should 
> consider renaming Aggregator? Something like "metric" might be clearer.
> 

Reply via email to