If the data does not have a key (or you do not care about it) you can also
use a FlatMapFunction (or ProcessFunction) with Operator State. Operator
State is not bound to a key but to a parallel operator instance. Have a
look at the ListCheckpointed interface and its JavaDocs.

2017-06-23 18:27 GMT+02:00 Edward <egb...@hotmail.com>:

> So there is no way to do a countWindow(100) and preserve data locality?
>
> My use case is this: augment a data stream with new fields from DynamoDB
> lookup. DynamoDB allows batch get's of up to 100 records, so I am trying to
> collect 100 records before making that call. I have no other reason to do a
> repartitioning, so I am hoping to avoid incurring the cost of shipping all
> the data across the network to do this.
>
> If I use countWindowAll, I am limited to parallelism = 1, so all data gets
> repartitioned twice. And if I use keyBy().countWindow(), then it gets
> repartitioned by key. So in both cases I lose locality.
>
> Am I missing any other options?
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Strange-behavior-of-DataStream-
> countWindow-tp7482p13981.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Reply via email to