Hi, Thanks. I was able to get this working. Had to use recordFileFormat though.
Is there a performance difference between FileRecordFormat <https://nightlies.apache.org/flink/flink-docs-release-1.14/api/java/org/apache/flink/connector/file/src/reader/FileRecordFormat.html> and BulkFormat <https://nightlies.apache.org/flink/flink-docs-release-1.14/api/java/org/apache/flink/connector/file/src/reader/BulkFormat.html> ? Thanks, Megh On Fri, Dec 10, 2021 at 2:48 PM Roman Khachatryan <ro...@apache.org> wrote: > Hi, > > Have you tried constructing a Hybrid source from a File source created > with FileSource.forBulkFileFormat [1] and "gs://bucket" scheme [2] > directly? > > [1] > > https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/file/src/FileSource.html#forBulkFileFormat-org.apache.flink.connector.file.src.reader.BulkFormat-org.apache.flink.core.fs.Path...- > [2] > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/gcs/ > > Regards, > Roman > > On Thu, Dec 9, 2021 at 1:04 PM Meghajit Mazumdar > <meghajit.mazum...@gojek.com> wrote: > > > > Hello, > > > > We have a requirement as follows: > > > > We want to stream events from 2 sources: Parquet files stored in a GCS > Bucket, and a Kafka topic. > > With the release of Hybrid Source in Flink 1.14, we were able to > construct a Hybrid Source which produces events from two sources: a > FileSource which reads data from a locally saved Parquet File, and a > KafkaSource consuming events from a remote Kafka broker. > > > > I was wondering if instead of using a local Parquet file, whether it is > possible to directly stream the file from a GCS bucket and construct a File > Source out of it at runtime ? The Parquet Files are quite big and it's a > bit expensive to download. > > > > Does Flink have such a functionality ? Or, has anyone come across such a > use case previously ? Would greatly appreciate some help on this. > > > > Looking forward to hearing from you. > > > > Thanks, > > Megh >