We're using Avro 1.7.4 with Apache Flume 1.9.0, and have written a Flume interceptor in Java to handle the deserializing with the old schema and the reserializing with the new schema.
In the interceptor, we have the following code to deserialize: GenericDatumReader<GenericRecord> reader = new GenericDatumReader<>(oldSchema, newSchema); BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(event.getBody(), null); GenericRecord record = reader.read(null, decoder); Then the following code to reserialize: ByteArrayOutputStream outStream = new ByteArrayOutputStream(); GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<>(newSchema); BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outStream, null); writer.write(record, encoder); encoder.flush(); event.setBody(outStream.toByteArray()); On Sat, Jan 15, 2022 at 2:33 PM Spencer Nelson <s...@spencerwnelson.com> wrote: > > This should work according to the spec. What language and Avro library are > you using, and with what version? > > Aliases are a bit tricky to use correctly. When deserializing, you may need > to indicate the writer’s schema as using oldFieldName1 and oldFieldName2, > while the reader schema uses newFieldName1 and newFieldName2. In other words, > you may need to provide both the old and new schemas to the deserializer. > This is just built in to how aliases work ( > https://avro.apache.org/docs/current/spec.html#Aliases). This may be a little > abstract and unclear; it’s easier to describe in the context of a particular > language. > > > On Sat, Jan 15, 2022 at 8:37 AM Spencer Lu <spencer...@gmail.com> wrote: >> >> Hi everyone, >> >> We have an application that receives Avro data, and it needs to rename >> certain fields in the data before sending it downstream. The >> application is using the following Avro schema to send the data >> downstream (note that 2 of the fields have aliases defined): >> >> { >> "name":"MyCompanyRecordAvroEncoder", >> "aliases":["com.mycompany.avro.MyStats"], >> "type":"record", >> "fields":[ >> {"name":"newFieldName1","type":["null", >> "int"],"default":null,"aliases":["oldFieldName1"]} >> >> {"name":"statusRecords","type":{"type":"array","items":{"name":"StatusAvroRecord","type":"record","fields" >> : [ >> {"name":"recordId","type":"long"}, >> {"name":"recordName","type":["null", "string"],"default":null}, >> {"name":"newFieldName2","type":["null", >> "string"],"default":null,"aliases":["oldFieldName2"]} >> ]}}, "default": []} >> ] >> } >> >> We see that our application receives the following Avro data: >> >> { >> "oldFieldName1": 300, >> "statusRecords": [ >> { >> "recordId": 100, >> "recordName": "Record1", >> "oldFieldName2": >> "{\"type\":\"XYZ\",\"properties\":{\"property1\":-1.2,\"property2\":\"Value\"}}" >> } >> ] >> } >> >> Then the application sends the following Avro data downstream: >> >> { >> "newFieldName1": 300, >> "statusRecords": [ >> { >> "recordId": 100, >> "recordName": "Record1", >> "newFieldName2": null >> } >> ] >> } >> >> As you can see, newFieldName1 is aliased to oldFieldName1 and has the >> value from oldFieldName1, so its alias is working. >> >> However, newFieldName2 is aliased to oldFieldName2, but it is null >> instead of having the value from oldFieldName2, so its alias is not >> working. >> >> The only difference I see between newFieldName1 and newFieldName2 is >> that newFieldName2 is a field within an array item. Do aliases not >> work for fields in array items? Or is there some other issue? >> >> Any idea how can I get the alias for newFieldName2 to work? >> >> Thanks, >> Spencer