Re: Avro format in pyFlink

rodrigobrochado Thu, 13 Aug 2020 19:06:58 -0700


The upload of the schema through Avro(avro_schema) worked, but I had to
select one type from the union type to put in Schema.field(field_type)
inside t_env.connect(). If my dict has long and double values, and I declare
Schema.field(DataTypes.Double()), all the int values are cast to double. My
maps will also have string values and the job will crash using this
configuration.


Is there any workaround? If not, I thought of serializing it on the UDTF
using the python avro lib and sending it as bytes to the sink. The problem
is that all serialization formats change the original schema: the CSV format
use the base64 encoding for bytes; the JSON format adds a key, to form a
key/value pair, where the value will the binary; and the Avro format adds 3
bytes at the beginning of the message.

Thanks,
Rodrigo



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Avro format in pyFlink

Reply via email to