Flink app reading parquet files from s3 and sink to kafka topics.

Lijuan Hou Tue, 09 Jan 2024 15:58:33 -0800

Hi team, can I get some help and suggestions on how to resolve the
following issue? Thank you in advance!
I am implementing an app to read parquet files from s3, and sink to kafka
topics.
For s3 schema, I am using schema avsc files with header outside of topic
schema content.
I have made sure that the targeting topics have been registered using
schema repo version of 1501, and the local s3 schema or topic schema are
also using the version of 1501.
But for schema used in kafka sink:


   - 1 - If using s3 schema, there are schema issues like Could not
   register schema , Schema not found , Not in union .
   - 2 - If switching to topic schema, there is ClassCastException: class
   java.util.HashMap cannot be cast to class
   org.apache.avro.generic.IndexedRecord

I think maybe I need to process the datastream of GenericRecord read from
s3 by removing header from datastream before sink to kafka. I tried using
MapFunction to do this, but not working.
Is there any way to properly address this issue? Thank you!

Flink app reading parquet files from s3 and sink to kafka topics.

Reply via email to