Hi,Harshvardhan I think you could use some factory such as `ParquetAvroWriters.forXXXX` form `ParquetAvroWriters.java` [1]. And you could see more same class in the package `org.apache.flink.formats.parquet.`
[1] https://github.com/apache/flink/blob/master/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/ParquetAvroWriters.java Best, Guowei On Mon, Sep 27, 2021 at 2:36 AM Harshvardhan Shinde < harshvardhan.shi...@oyorooms.com> wrote: > Hi, > > Thanks for the response. > > How can this streaming data be written to S3 for the path to be given? > Also I see that the FileSink takes GenericRecord, so how can the > DataStream be converted to a GenericRecord? > > Please bear with me if my questions don't make any sense. > > On Sun, Sep 26, 2021 at 9:12 AM Guowei Ma <guowei....@gmail.com> wrote: > >> Hi, Harshvardhan >> >> I think CaiZhi is right. >> I only have a small addition. Because I see that you want to convert >> Table to DataStream, you can look at FileSink (ParquetWriterFactory)[1]. >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/file_sink/#bulk-encoded-formats >> >> Best, >> Guowei >> >> >> On Sun, Sep 26, 2021 at 10:31 AM Caizhi Weng <tsreape...@gmail.com> >> wrote: >> >>> Hi! >>> >>> Try the PARTITIONED BY clause. See >>> https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/formats/parquet/ >>> >>> Harshvardhan Shinde <harshvardhan.shi...@oyorooms.com> 于2021年9月24日周五 >>> 下午5:52写道: >>> >>>> Hi, >>>> I wanted to know if we can write streaming data to S3 in parquet format >>>> with partitioning. >>>> Here's what I want to achieve: >>>> I have a kafka table which gets updated with the data from kafka topic >>>> and I'm using select statement to get the data into a Table and converting >>>> into a stream as: >>>> >>>> StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); >>>> Table table = tableEnv.sqlQuery("Select * from test"); >>>> DataStream<Row> stream = tableEnv.toDataStream(table); >>>> >>>> Now I want to write this stream to S3 in parquet files with hourly >>>> partitions. >>>> >>>> Here are my questions: >>>> 1. Is this possible? >>>> 2. If yes, how it can be achieved or link to appropriate documentation. >>>> >>>> Thanks and Regards, >>>> Harshvardhan >>>> >>>> > > -- > Thanks and Regards, > Harshvardhan > Data Platform >