On Wed, Apr 17, 2019 at 7:48 AM Viliam Durina wrote:
> > Combine.perKey ... certainly is standardized / well-defined
>
> Is there any document where it's defined?
>
At the user level, here:
https://beam.apache.org/documentation/programming-guide/#combine
There are a few places that define it.
> Combine.perKey ... certainly is standardized / well-defined
Is there any document where it's defined?
Viliam
On Tue, 16 Apr 2019 at 18:27, Kenneth Knowles wrote:
> On Tue, Apr 16, 2019 at 9:18 AM Reuven Lax wrote:
>
>> A common request (especially in streaming) is to support sorting values
On Tue, Apr 16, 2019 at 9:18 AM Reuven Lax wrote:
> A common request (especially in streaming) is to support sorting values by
> timestamp, not by the full value.
>
On this point, I think an explicit secondary key probably addresses the
need. Naively implemented, the "sort by values" use case
This is a good conversation. Some things to consider:
Since Beam is cross language, the "shufflers" can usually only sort by
binary value. This is different than other systems where custom comparators
can be used for sorting. We might need to introduce OrderPreservingCoder,
and mark the coders
1. This is clearly useful, and extensively used. Agree with all that. I
think it can work for batch and streaming equally well if sorting is
required only per "pane", though I might be overlooking something.
2. A transform need not be primitive to be well-defined and executed in a
special way by