Fwd: Decouple Kafka partitions and Flink parallelism for ordered streams

2017-10-13 Thread Sanne de Roever
Hi Chesnay, /** Fowarding this to group, I mistakingly replied to you directly previously, apologies */ The side output option works in combination with setting slot sharing groups. For reference I have included a source file. The job takes three slots. One slot for input handling, and one slot

Re: Decouple Kafka partitions and Flink parallelism for ordered streams

2017-10-11 Thread Chesnay Schepler
I couldn't find a proper solution for this. The easiest solution might be to use the Async I/O , and do the validation with an ExecutionService or similar in the map function. I've CC'd aljoscha,

Re: Decouple Kafka partitions and Flink parallelism for ordered streams

2017-10-11 Thread Chesnay Schepler
It is correct that keyBy and partition operations will distribute messages over the network as they distribute the data across all subtasks. For this use-case we only want to consider subtasks that are subsequent to our operator, like a local keyBy. I don't think there is an obvious way to

Decouple Kafka partitions and Flink parallelism for ordered streams

2017-10-11 Thread Sanne de Roever
Hi, Currently we need 75 Kafka partitions per topic and a parallelism of 75 to meet required performance, increasing the partitions and parallelism gives diminished returns Currently the performance is approx. 1500 msg/s per core, having one pipeline (source, map, sink) deployed as one instance