Re: Ordered PCollections eventually?

2021-05-15 Thread Kenneth Knowles
Nice pointers to the Spark feature. That's interesting. Couple thoughts: - Totally different from per-group ordering (which fits in Beam without deep new model features). - Related to ordered transport, actually, because that is how the ordering produced per partition would actually result in

Re: Ordered PCollections eventually?

2021-05-11 Thread Jan Lukavský
I'll just remind that Beam already supports (experimental) @RequiresTimeSortedInput (which has several limitations, mostly in that it orders only by timestamp and not some - time related - user field; and of course - missing retractions). An arbitrary sorting seems to be hard, even per-key, it

Re: Ordered PCollections eventually?

2021-05-11 Thread Kenneth Knowles
Per-key ordered delivery makes a ton of sense. I'd guess CDC has the same needs as retractions, so that the changelog can be applied in order as it arrives. And since it is per-key you still get parallelism. Global ordering is quite different. I know that SQL and Dataframes have global sorting

Re: Ordered PCollections eventually?

2021-05-10 Thread Sam Rohde
Awesome, thanks Pablo! On Mon, May 10, 2021 at 4:05 PM Pablo Estrada wrote: > CDC would also benefit. I am working on a proposal for this that is > concerned with streaming pipelines, and per-key ordered delivery. I will > share with you as soon as I have a draft. > Best > -P. > > On Mon, May

Re: Ordered PCollections eventually?

2021-05-10 Thread Pablo Estrada
CDC would also benefit. I am working on a proposal for this that is concerned with streaming pipelines, and per-key ordered delivery. I will share with you as soon as I have a draft. Best -P. On Mon, May 10, 2021 at 2:56 PM Reuven Lax wrote: > There has been talk, but nothing concrete. > > On

Ordered PCollections eventually?

2021-05-10 Thread Sam Rohde
Hi All, I was wondering if there had been any plans for creating ordered PCollections in the Beam model? Or if there might be plans for them in the future? I know that Beam SQL and Beam DataFrames would directly benefit from an ordered PCollection. Regards, Sam