Hello, I’m trying to perform a stateful mapping of some objects coming in from Kafka in a parallelized flink job (set on the job using env.setParallelism(3)). The data source is a kafka topic, but the partitions aren’t meaningfully keyed for this operation (each kafka message is flatMapped to between 0-2 objects, with potentially different keys). I have a keyBy() operator directly before my map(), but I’m seeing objects with the same key distributed to different parallel task instances, as reported by getRuntimeContext().getIndexOfThisSubtask().
My understanding of keyBy is that it would segment the stream by key, and guarantee that all data with a given key would hit the same instance. Am I possibly seeing residual “keying” from the kafka topic? I’m running flink 1.1.3 in scala. Please let me know if I can add more info. Thanks, Andrew