Hi arjun Flink will save the currently processed file and its corresponding offset in Flink state [1]. You may need to use the Flink state process API[1] to access it.
However, I don't think this is a good approach. I suggest adding relevant metrics to the FileSystem connector to report the current number of pending files for monitoring the status of file processing. For the details of the processed files , you can refer to FLIP 27 [3] design document, which contains some details. 1. https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/PendingSplitsCheckpoint.java 2. https://nightlies.apache.org/flink/flink-docs-release-1.18/zh/docs/libs/state_processor_api/ 3. https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface Best, Feng On Tue, Oct 31, 2023 at 12:16 AM arjun s <[email protected]> wrote: > Hi team, > I'm also interested in finding out if there is Java code available to > determine the extent to which a Flink job has processed files within a > directory. Additionally, I'm curious about where the details of the > processed files are stored within Flink. > > Thanks and regards, > Arjun S >
