Thanks Caizhi. This clarifies.

On Fri, Jan 28, 2022 at 12:06 PM Caizhi Weng <tsreape...@gmail.com> wrote:

> Hi!
>
> FileEnumerator never reads the actual content of a file. FileEnumerator
> lives in job managers and it only reads the necessary meta-data of the file
> (for example how large is the file) so that it can split the work across
> all task managers. Corresponding file readers, in the other hand, lives in
> task managers and perform the exact reading work. They accept file splits
> assigned to them and read the contents corresponding to these splits.
>
> Meghajit Mazumdar <meghajit.mazum...@gojek.com> 于2022年1月27日周四 16:57写道:
>
>> Hello,
>>
>> I had a question about the FileSource in Flink 1.14
>> <https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/file/src/FileSource.html>
>> .
>>
>> Considering FileSource is set to read from a remote GCS URL, I could read
>> and understand that the FileEnumerator is actually responsible for
>> discovering the files under the URL.
>>
>> However, how does the FileSource, and thus the FileEnumerator, generate
>> the splits when a remote URL is used ? Does it:
>> 1. download all the files eagerly and then generate the splits ?, or
>> 2. only downloads and generates the splits when the source reader asks
>> for splits ?, or
>> 3. doesn't download but only streams the data from the remote as required
>> ?
>>
>> Would be great if somebody could help me out. Thanks !
>>
>> *Regards,*
>> *Meghajit*
>>
>

-- 
*Regards,*
*Meghajit*

Reply via email to