Thanks Caizhi. This clarifies. On Fri, Jan 28, 2022 at 12:06 PM Caizhi Weng <tsreape...@gmail.com> wrote:
> Hi! > > FileEnumerator never reads the actual content of a file. FileEnumerator > lives in job managers and it only reads the necessary meta-data of the file > (for example how large is the file) so that it can split the work across > all task managers. Corresponding file readers, in the other hand, lives in > task managers and perform the exact reading work. They accept file splits > assigned to them and read the contents corresponding to these splits. > > Meghajit Mazumdar <meghajit.mazum...@gojek.com> 于2022年1月27日周四 16:57写道: > >> Hello, >> >> I had a question about the FileSource in Flink 1.14 >> <https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/file/src/FileSource.html> >> . >> >> Considering FileSource is set to read from a remote GCS URL, I could read >> and understand that the FileEnumerator is actually responsible for >> discovering the files under the URL. >> >> However, how does the FileSource, and thus the FileEnumerator, generate >> the splits when a remote URL is used ? Does it: >> 1. download all the files eagerly and then generate the splits ?, or >> 2. only downloads and generates the splits when the source reader asks >> for splits ?, or >> 3. doesn't download but only streams the data from the remote as required >> ? >> >> Would be great if somebody could help me out. Thanks ! >> >> *Regards,* >> *Meghajit* >> > -- *Regards,* *Meghajit*