Hi Jiawei,

I agree that the offset management mechanism uses the same code as Kinesis 
Stream Consumer and in theory should not lose exactly-once semantics. As Ying 
is alluding to, if your application is restarted and you have snapshotting 
disabled in AWS there is a chance that records can be lost between runs. 
However, if you have snapshotting enabled then the application should continue 
consuming records from the last processed sequence number.

I am happy to take a deeper look if you can provide more information/logs/code.

Thanks,

From: Ying Xu <y...@lyft.com>
Date: Monday, 14 September 2020 at 19:48
To: Andrey Zagrebin <azagre...@apache.org>
Cc: Jiawei Wu <wujiawei5837...@gmail.com>, user <user@flink.apache.org>
Subject: RE: [EXTERNAL] Flink DynamoDB stream connector losing records


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Jiawei:

Sorry for the delayed reply.  When you mention certain records getting skipped, 
is it from the same run or across different runs.  Any more specific details on 
how/when records are lost?

FlinkDynamoDBStreamsConsumer is built on top of FlinkKinesisConsumer , with 
similar offset management mechanism.  In theory it shouldn't lose exactly-once 
semantics in the case of getting throttled.  We haven't run it in any AWS 
kinesis analytics environment though.

Thanks.


On Thu, Sep 10, 2020 at 7:51 AM Andrey Zagrebin 
<azagre...@apache.org<mailto:azagre...@apache.org>> wrote:
Generally speaking this should not be a problem for exactly-once but I am not 
familiar with the DynamoDB and its Flink connector.
Did you observe any failover in Flink logs?

On Thu, Sep 10, 2020 at 4:34 PM Jiawei Wu 
<wujiawei5837...@gmail.com<mailto:wujiawei5837...@gmail.com>> wrote:
And I suspect I have throttled by DynamoDB stream, I contacted AWS support but 
got no response except for increasing WCU and RCU.

Is it possible that Flink will lose exactly-once semantics when throttled?

On Thu, Sep 10, 2020 at 10:31 PM Jiawei Wu 
<wujiawei5837...@gmail.com<mailto:wujiawei5837...@gmail.com>> wrote:
Hi Andrey,

Thanks for your suggestion, but I'm using Kinesis analytics application which 
supports only Flink 1.8....

Regards,
Jiawei

On Thu, Sep 10, 2020 at 10:13 PM Andrey Zagrebin 
<azagre...@apache.org<mailto:azagre...@apache.org>> wrote:
Hi Jiawei,

Could you try Flink latest release 1.11?
1.8 will probably not get bugfix releases.
I will cc Ying Xu who might have a better idea about the DinamoDB source.

Best,
Andrey

On Thu, Sep 10, 2020 at 3:10 PM Jiawei Wu 
<wujiawei5837...@gmail.com<mailto:wujiawei5837...@gmail.com>> wrote:
Hi,

I'm using AWS kinesis analytics application with Flink 1.8. I am using the 
FlinkDynamoDBStreamsConsumer to consume DynamoDB stream records. But recently I 
found my internal state is wrong.

After I printed some logs I found some DynamoDB stream record are skipped and 
not consumed by Flink. May I know if someone encountered the same issue before? 
Or is it a known issue in Flink 1.8?

Thanks,
Jiawei

Reply via email to