Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-30 Thread Kenneth Knowles
On Wed, Apr 17, 2019 at 7:48 AM Viliam Durina wrote: > > Combine.perKey ... certainly is standardized / well-defined > > Is there any document where it's defined? > At the user level, here: https://beam.apache.org/documentation/programming-guide/#combine There are a few places that define it.

Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-17 Thread Viliam Durina
> Combine.perKey ... certainly is standardized / well-defined Is there any document where it's defined? Viliam On Tue, 16 Apr 2019 at 18:27, Kenneth Knowles wrote: > On Tue, Apr 16, 2019 at 9:18 AM Reuven Lax wrote: > >> A common request (especially in streaming) is to support sorting values

Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-16 Thread Kenneth Knowles
On Tue, Apr 16, 2019 at 9:18 AM Reuven Lax wrote: > A common request (especially in streaming) is to support sorting values by > timestamp, not by the full value. > On this point, I think an explicit secondary key probably addresses the need. Naively implemented, the "sort by values" use case

Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-16 Thread Reuven Lax
This is a good conversation. Some things to consider: Since Beam is cross language, the "shufflers" can usually only sort by binary value. This is different than other systems where custom comparators can be used for sorting. We might need to introduce OrderPreservingCoder, and mark the coders

Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-16 Thread Kenneth Knowles
1. This is clearly useful, and extensively used. Agree with all that. I think it can work for batch and streaming equally well if sorting is required only per "pane", though I might be overlooking something. 2. A transform need not be primitive to be well-defined and executed in a special way by