[ https://issues.apache.org/jira/browse/BEAM-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739362#comment-16739362 ]
Ismaël Mejía commented on BEAM-3494: ------------------------------------ [~mxm] do you know if this is still an issue? > Snapshot state of aggregated data of apache beam project is not maintained in > flink's checkpointing > ---------------------------------------------------------------------------------------------------- > > Key: BEAM-3494 > URL: https://issues.apache.org/jira/browse/BEAM-3494 > Project: Beam > Issue Type: Bug > Components: runner-flink > Reporter: suganya > Priority: Major > > We have a beam project which consumes events from kafka,does a groupby in a > time window(5 mins),after window elapses it pushes the events to downstream > for merge.This project is deployed using flink ,we have enabled checkpointing > to recover from failed state. > (windowsize: 5mins , checkpointingInterval: 5mins,state.backend: filesystem) > Offsets from kafka get checkpointed every 5 > mins(checkpointingInterval).Before finishing the entire DAG(groupBy and > merge) , events offsets are getting checkpointed.So incase of any restart > from task-manager ,new task gets started from last successful checkpoint ,but > we could'nt able to get the aggregated snapshot data(data from groupBy task) > from the persisted checkpoint. > Able to retrieve the last successful checkpointed offset from kafka ,but > couldnt able to get last aggregated data till checkpointing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)