Parallelism and stateful mapping with Flink

Andrew Roberts Wed, 07 Dec 2016 06:58:23 -0800

Hello,

I’m trying to perform a stateful mapping of some objects coming in from Kafka 
in a parallelized flink job (set on the job using env.setParallelism(3)). The 
data source is a kafka topic, but the partitions aren’t meaningfully keyed for 
this operation (each kafka message is flatMapped to between 0-2 objects, with 
potentially different keys). I have a keyBy() operator directly before my 
map(), but I’m seeing objects with the same key distributed to different 
parallel task instances, as reported by 
getRuntimeContext().getIndexOfThisSubtask().


My understanding of keyBy is that it would segment the stream by key, and 
guarantee that all data with a given key would hit the same instance. Am I 
possibly seeing residual “keying” from the kafka topic?

I’m running flink 1.1.3 in scala. Please let me know if I can add more info.

Thanks,

Andrew

Parallelism and stateful mapping with Flink

Reply via email to