Hi,

I have noticed a strange behavior in one of our jobs: every once in a while
the Kafka source checkpointing time becomes extremely large compared to
what it usually is. (To be very specific it is a kafka source chained with
a stateless map operator)

To be more specific checkpointing the offsets usually takes around 10ms
which sounds reasonable but in some checkpoints this goes into the 3-5
minutes range practically blocking the job for that period of time.
Yesterday I have observed even 10 minute delays. First I thought that some
sources might trigger checkpoints later than others, but adding some
logging and comparing it it seems that the triggerCheckpoint was received
at the same time.

Interestingly only one of the 3 kafka sources in the job seems to be
affected (last time I checked at least). We are still using the 0.8
consumer with commit on checkpoints. Also I dont see this happen in other
jobs.

Any clue on what might cause this?

Thanks :)
Gyula

Reply via email to