Hello,
I am writing a small application to understand how checkpoint and recovery 
works in Flink.
Here is my setup.
Flink 3 node cluster (1 JM, 2 TM) built from latest github codebaseHDP 2.4 
deployed with just HDFS for checkpoint and sinkKafka 0.9x
The sample code pulls Kafka topic (1 partition) and do some transformation, 
sinks it to HDFS (RollingSink). For now, I am creating sink file for every 1KB 
size and checkpoint is trigerred every second.
The message pumped in to Kafka are just sequence numbers 1, 2, 3 .... N
Below are my observations from the setup.
1) Flink uses checkpoint location from HDFS to maintain the checkpoint 
information2) Sink is working properly under normal run (read no TM failures)
Questions:1) How do I find the Kafka topic/partition offset details that Flink 
mainatins in checkpoint (readable format)
2) When I manually simulate TM failures, I sometime see data duplicate data. I 
was expecting exactly once mechanism to work but found some duplicates. How do 
I validate exactly once is working fine or not?
3) How can I simulate and verify backpressure? I have introduced some delay 
(Thread Sleep) in the job before the sink but the "backpressure" tab from UI 
does not show any indication of whether backpressure is working or not.
Appreciate your thoughts.
RegardsVijay

Reply via email to