Re: ExpiredIteratorException when reading from a Kinesis stream

2016-10-04 Thread Tzu-Li (Gordon) Tai
Hi Steffen, Turns out that FLINK-4514 just missed Flink 1.1.2 and wasn’t included in the release (I’ll update the resolve version in JIRA to 1.1.3, thanks for noticing this!). The Flink community is going to release 1.1.3 asap, which will include the fix. If you don’t want to wait for the releas

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Josh
Hey Gordon, I've been using Flink 1.2-SNAPSHOT for the past week (with FLINK-4514) with no problems, but yesterday the Kinesis consumer started behaving strangely... My Kinesis data stream is fairly constant at around 1.5MB/sec, however the Flink Kinesis consumer started to stop consuming for peri

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Tzu-Li (Gordon) Tai
Hi Josh, That warning message was added as part of FLINK-4514. It pops out whenever a shard iterator was used after 5 minutes it was returned from Kinesis. The only time spent between after a shard iterator was returned and before it was used to fetch the next batch of records, is on deserializi

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Josh
Hi Gordon, Thanks for the fast reply! You're right about the expired iterator exception occurring just before each spike. I can't see any signs of long GC on the task managers... CPU has been <15% the whole time when the spikes were taking place and I can't see anything unusual in the task manager

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Scott Kidder
Hi Steffan & Josh, For what it's worth, I've been using the Kinesis connector with very good results on Flink 1.1.2 and 1.1.3. I updated the Flink Kinesis connector KCL and AWS SDK dependencies to the following versions: aws.sdk.version: 1.11.34 aws.kinesis-kcl.version: 1.7.0 My customizations a

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-03 Thread Stephan Ewen
Is it possible that you have stalls in your topology? Reasons could be: - The data sink blocks or becomes slow for some periods (where are you sending the data to?) - If you are using large state and a state backend that only supports synchronous checkpointing, there may be a delay introduce

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-04 Thread Josh
Hi Scott & Stephan, The problem has happened a couple more times since yesterday, it's very strange as my job was running fine for over a week before this started happening. I find that if I restart the job (and restore from the last checkpoint) it runs fine for a while (couple of hours) before br

Re: ExpiredIteratorException when reading from a Kinesis stream

2016-11-05 Thread Josh
I've reset the state and the job appears to be running smoothly again now. My guess is that this problem was somehow related to my state becoming too large (it got to around 20GB before the problem began). I would still like to get to the bottom of what caused this as resetting the job's state is n