Hi All,

I have a job as attached.

I have a 16 Core blade running RHEL 7. The taskmanager default number of
slots is set to 1. The source is a kafka stream and each of the 2
sources(topic) have 2 partitions each.


*What I notice is that when I deploy a job to run with #parallelism=2 the
total processing time doubles the time it took when the same job was
deployed with #parallelism=1. It linearly increases with the parallelism.*
Since the numberof slots is set to 1 per TM, I would assume that the job
would be processed in parallel in 2 different TMs and that each consumer in
each TM is connected to 1 partition of the topic. This therefore should
have kept the overall processing time the same or less !!!

The co-flatmap connects the 2 streams & uses ValueState (checkpointed in
FS). I think this is distributed among the TMs. My understanding is that
the search of values state could be costly between TMs.  Do you sense
something wrong here?

Best Regards
CVP

Reply via email to