I'll answer myself. I guess the most viable option for now is to wait for the work in http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Sharing-state-between-subtasks-td24489.html
On Thu, Mar 7, 2019, 3:24 PM gerardg <ger...@talaia.io> wrote: > I'm wondering if there is a way to avoid consuming too fast from partitions > that not have as much data as the other ones in the same topic by keeping > them more or less synchronized by its ingestion timestamp. Similar to what > kafka streams does: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization > > We are having an issue where partitions with less data are consumed very > fast which creates a lot of windows that can't be triggered until the > partitions with more data are consumed and the watermark gets advanced. It > seems that this issue should be quite common but we can't seem to find any > standard solution to it. Maybe is just that our partitions are too > unbalanced but still, without having a way to bound the skew between > partition (for example when processing accumulated data) it seems like a > potential source of problems. > > Anyone have an idea or suggestion to deal with this issue? > > Thanks, > > Gerard > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >