Nice pointers to the Spark feature. That's interesting. Couple thoughts:
- Totally different from per-group ordering (which fits in Beam without
deep new model features).
- Related to ordered transport, actually, because that is how the ordering
produced per partition would actually result in
I'll just remind that Beam already supports (experimental)
@RequiresTimeSortedInput (which has several limitations, mostly in that
it orders only by timestamp and not some - time related - user field;
and of course - missing retractions). An arbitrary sorting seems to be
hard, even per-key, it
Per-key ordered delivery makes a ton of sense. I'd guess CDC has the same
needs as retractions, so that the changelog can be applied in order as it
arrives. And since it is per-key you still get parallelism.
Global ordering is quite different. I know that SQL and Dataframes have
global sorting
Awesome, thanks Pablo!
On Mon, May 10, 2021 at 4:05 PM Pablo Estrada wrote:
> CDC would also benefit. I am working on a proposal for this that is
> concerned with streaming pipelines, and per-key ordered delivery. I will
> share with you as soon as I have a draft.
> Best
> -P.
>
> On Mon, May
CDC would also benefit. I am working on a proposal for this that is
concerned with streaming pipelines, and per-key ordered delivery. I will
share with you as soon as I have a draft.
Best
-P.
On Mon, May 10, 2021 at 2:56 PM Reuven Lax wrote:
> There has been talk, but nothing concrete.
>
> On
Hi All,
I was wondering if there had been any plans for creating ordered
PCollections in the Beam model? Or if there might be plans for them in the
future?
I know that Beam SQL and Beam DataFrames would directly benefit from an
ordered PCollection.
Regards,
Sam