Yes thats definitely what I am about to do next but just thought maybe someone has seen this before.
Will post info next time it happens. (Not guaranteed to happen soon as it didn't happen for a long time before) Gyula On Wed, Jul 12, 2017, 12:13 Stefan Richter <s.rich...@data-artisans.com> wrote: > Hi, > > could you introduce some logging to figure out from which method call the > delay is introduced? > > Best, > Stefan > > Am 12.07.2017 um 11:37 schrieb Gyula Fóra <gyula.f...@gmail.com>: > > Hi, > > We are using the latest 1.3.1 > > Gyula > > Urs Schoenenberger <urs.schoenenber...@tngtech.com> ezt írta (időpont: > 2017. júl. 12., Sze, 10:44): > >> Hi Gyula, >> >> I don't know the cause unfortunately, but we observed a similiar issue >> on Flink 1.1.3. The problem seems to be gone after upgrading to 1.2.1. >> Which version are you running on? >> >> Urs >> >> On 12.07.2017 09:48, Gyula Fóra wrote: >> > Hi, >> > >> > I have noticed a strange behavior in one of our jobs: every once in a >> while >> > the Kafka source checkpointing time becomes extremely large compared to >> > what it usually is. (To be very specific it is a kafka source chained >> with >> > a stateless map operator) >> > >> > To be more specific checkpointing the offsets usually takes around 10ms >> > which sounds reasonable but in some checkpoints this goes into the 3-5 >> > minutes range practically blocking the job for that period of time. >> > Yesterday I have observed even 10 minute delays. First I thought that >> some >> > sources might trigger checkpoints later than others, but adding some >> > logging and comparing it it seems that the triggerCheckpoint was >> received >> > at the same time. >> > >> > Interestingly only one of the 3 kafka sources in the job seems to be >> > affected (last time I checked at least). We are still using the 0.8 >> > consumer with commit on checkpoints. Also I dont see this happen in >> other >> > jobs. >> > >> > Any clue on what might cause this? >> > >> > Thanks :) >> > Gyula >> > >> > >> > >> > Hi, >> > >> > I have noticed a strange behavior in one of our jobs: every once in a >> > while the Kafka source checkpointing time becomes extremely large >> > compared to what it usually is. (To be very specific it is a kafka >> > source chained with a stateless map operator) >> > >> > To be more specific checkpointing the offsets usually takes around 10ms >> > which sounds reasonable but in some checkpoints this goes into the 3-5 >> > minutes range practically blocking the job for that period of time. >> > Yesterday I have observed even 10 minute delays. First I thought that >> > some sources might trigger checkpoints later than others, but adding >> > some logging and comparing it it seems that the triggerCheckpoint was >> > received at the same time. >> > >> > Interestingly only one of the 3 kafka sources in the job seems to be >> > affected (last time I checked at least). We are still using the 0.8 >> > consumer with commit on checkpoints. Also I dont see this happen in >> > other jobs. >> > >> > Any clue on what might cause this? >> > >> > Thanks :) >> > Gyula >> >> -- >> Urs Schönenberger - urs.schoenenber...@tngtech.com >> >> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke >> Sitz: Unterföhring * Amtsgericht München * HRB 135082 >> > >