Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-30 Thread arjun s
Hi team, I'm also interested in finding out if there is Java code available to determine the extent to which a Flink job has processed files within a directory. Additionally, I'm curious about where the details of the processed files are stored within Flink. Thanks and regards, Arjun S On Mon, 30

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-29 Thread arjun s
Hi team, I appreciate the information provided. I'm inquiring whether there exists a method to automatically relocate processed files from a directory once a Flink job has completed processing them. I'm particularly keen on understanding how this particular use case is currently managed in product

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-28 Thread Alexander Fedulov
> Or was it the querying of the checkpoints you were advising against? Yes, I meant the approach, not file removal itself. Mainly because how exactly FileSource stores its state is an implementation detail and there are no external guarantees for its consistency between even the minor versions. On

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-28 Thread Andrew Otto
> This is not a robust solution, I would advise against it. Oh no? Am curious as to why not. It seems not dissimilar to how Kafka topic retention works: the messages are removed after some time period (hopefully after they are processed), so why would it be bad to remove files that are already pr

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-27 Thread Alexander Fedulov
> I wonder if you could use this fact to query the committed checkpoints and move them away after the job is done. This is not a robust solution, I would advise against it. Best, Alexander On Fri, 27 Oct 2023 at 16:41, Andrew Otto wrote: > For moving the files: > > It will keep the files as is

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-27 Thread Andrew Otto
For moving the files: > It will keep the files as is and remember the name of the file read in checkpointed state to ensure it doesnt read the same file twice. I wonder if you could use this fact to query the committed checkpoints and move them away after the job is done. I think it should even b

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-26 Thread arjun s
Hi team, Thanks for your quick response. I have an inquiry regarding file processing in the event of a job restart. When the job is restarted, we encounter challenges in tracking which files have been processed and which remain pending. Is there a method to seamlessly resume processing files from w

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-26 Thread Chirag Dewan via user
Hi Arjun, Flink's FileSource doesnt move or delete the files as of now. It will keep the files as is and remember the name of the file read in checkpointed state to ensure it doesnt read the same file twice. Flink's source API works in a way that single Enumerator operates on the JobManager. Th

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-26 Thread Alexander Fedulov
Flink's FileSource will enumerate the files and keep track of the progress in parallel for the individual files. Depending on the format you use, the progress is tracked at the different level of granularity (TextLine being the simplest one that tracks the progress based on the number of lines proc