Re: Parallelism and stateful mapping with Flink

Stefan Richter Wed, 07 Dec 2016 08:29:39 -0800

Hi,

could you maybe provide the (minimal) code for the problematic job? Also, are 
you sure that the keyBy is working on the correct key attribute?


Best,
Stefan

> Am 07.12.2016 um 15:57 schrieb Andrew Roberts <arobe...@fuze.com>:
> 
> Hello,
> 
> I’m trying to perform a stateful mapping of some objects coming in from Kafka 
> in a parallelized flink job (set on the job using env.setParallelism(3)). The 
> data source is a kafka topic, but the partitions aren’t meaningfully keyed 
> for this operation (each kafka message is flatMapped to between 0-2 objects, 
> with potentially different keys). I have a keyBy() operator directly before 
> my map(), but I’m seeing objects with the same key distributed to different 
> parallel task instances, as reported by 
> getRuntimeContext().getIndexOfThisSubtask().
> 
> My understanding of keyBy is that it would segment the stream by key, and 
> guarantee that all data with a given key would hit the same instance. Am I 
> possibly seeing residual “keying” from the kafka topic?
> 
> I’m running flink 1.1.3 in scala. Please let me know if I can add more info.
> 
> Thanks,
> 
> Andrew

Re: Parallelism and stateful mapping with Flink

Reply via email to