Hi arjun

Flink will save the currently processed file and its corresponding offset
in Flink state [1]. You may need to use the Flink state process API[1] to
access it.

However, I don't think this is a good approach. I suggest adding relevant
metrics to the FileSystem connector to report the current number of pending
files for monitoring the status of file processing.

For the details of the processed files , you can refer to FLIP 27 [3]
design document, which contains some details.


   1.


   
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/PendingSplitsCheckpoint.java
   2.


   
https://nightlies.apache.org/flink/flink-docs-release-1.18/zh/docs/libs/state_processor_api/
   3.


   
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface


Best,

Feng

On Tue, Oct 31, 2023 at 12:16 AM arjun s <arjunjoice...@gmail.com> wrote:

> Hi team,
> I'm also interested in finding out if there is Java code available to
> determine the extent to which a Flink job has processed files within a
> directory. Additionally, I'm curious about where the details of the
> processed files are stored within Flink.
>
> Thanks and regards,
> Arjun S
>

Reply via email to