Re: Reinterpreting a pre-partitioned data stream as keyed stream

Guowei Ma Wed, 29 Jan 2020 00:54:23 -0800

Hi, Krzysztof


When you use the *reinterpretAsKeyedStream* you must guarantee that
partition is the same as Flink does by yourself. But before going any
further I think we should know whether normal DataStream API could satisfy
your requirements without using *reinterpretAsKeyedStream.*


An operator could send its output to another operator in two ways:
one-to-one(forward) or redistributing[1]. In one-to-one(forward) the
partition and order of the event would keep the same in the two operators.
Two operators would use the forward by default if the parallelism of two
operator is same.

Without the total details I think maybe you could just *keyby* once if your
job does not have special needs. Or you could share the what your job looks
like if it is convenient.



[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/concepts/programming-model.html#parallel-dataflows

Best,
Guowei


KristoffSC <krzysiek.chmielew...@gmail.com> 于2020年1月28日周二 下午10:47写道：

> Hi all,
> we have a use case where order of received events matters and it should be
> kept across pipeline.
>
> Our pipeline would be paralleled. We can key the stream just after Source
> operator, but in order to keep the ordering among next operators we would
> have to still keep the stream keyed.
>
> Obviously we could key again and again but this would cause some
> performance
> penalty.
> We were thinking about using DataStreamUtils.reinterpretAsKeyedStream
> instead.
>
> Since this is an experimental functionality I would like to ask if there is
> someone among the community that is using this feature? Do we know about
> any
> open issues regarding this feature?
>
> Thanks,
> Krzysztof
>
>
>
>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Reinterpreting a pre-partitioned data stream as keyed stream

Reply via email to