We're using Avro 1.7.4 with Apache Flume 1.9.0, and have written a
Flume interceptor in Java to handle the deserializing with the old
schema and the reserializing with the new schema.

In the interceptor, we have the following code to deserialize:

GenericDatumReader<GenericRecord> reader = new
GenericDatumReader<>(oldSchema, newSchema);
BinaryDecoder decoder =
DecoderFactory.get().binaryDecoder(event.getBody(), null);
GenericRecord record = reader.read(null, decoder);

Then the following code to reserialize:

ByteArrayOutputStream outStream = new ByteArrayOutputStream();
GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<>(newSchema);
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outStream, null);
writer.write(record, encoder);
encoder.flush();
event.setBody(outStream.toByteArray());


On Sat, Jan 15, 2022 at 2:33 PM Spencer Nelson <s...@spencerwnelson.com> wrote:
>
> This should work according to the spec. What language and Avro library are 
> you using, and with what version?
>
> Aliases are a bit tricky to use correctly. When deserializing, you may need 
> to indicate the writer’s schema as using oldFieldName1 and oldFieldName2, 
> while the reader schema uses newFieldName1 and newFieldName2. In other words, 
> you may need to provide both the old and new schemas to the deserializer. 
> This is just built in to how aliases work (
> https://avro.apache.org/docs/current/spec.html#Aliases). This may be a little 
> abstract and unclear; it’s easier to describe in the context of a particular 
> language.
>
>
> On Sat, Jan 15, 2022 at 8:37 AM Spencer Lu <spencer...@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> We have an application that receives Avro data, and it needs to rename
>> certain fields in the data before sending it downstream. The
>> application is using the following Avro schema to send the data
>> downstream (note that 2 of the fields have aliases defined):
>>
>> {
>>     "name":"MyCompanyRecordAvroEncoder",
>>     "aliases":["com.mycompany.avro.MyStats"],
>>     "type":"record",
>>     "fields":[
>>         {"name":"newFieldName1","type":["null",
>> "int"],"default":null,"aliases":["oldFieldName1"]}
>>         
>> {"name":"statusRecords","type":{"type":"array","items":{"name":"StatusAvroRecord","type":"record","fields"
>> : [
>>             {"name":"recordId","type":"long"},
>>             {"name":"recordName","type":["null", "string"],"default":null},
>>             {"name":"newFieldName2","type":["null",
>> "string"],"default":null,"aliases":["oldFieldName2"]}
>>         ]}}, "default": []}
>>     ]
>> }
>>
>> We see that our application receives the following Avro data:
>>
>> {
>>     "oldFieldName1": 300,
>>     "statusRecords": [
>>         {
>>             "recordId": 100,
>>             "recordName": "Record1",
>>             "oldFieldName2":
>> "{\"type\":\"XYZ\",\"properties\":{\"property1\":-1.2,\"property2\":\"Value\"}}"
>>         }
>>     ]
>> }
>>
>> Then the application sends the following Avro data downstream:
>>
>> {
>>      "newFieldName1": 300,
>>      "statusRecords": [
>>          {
>>              "recordId": 100,
>>              "recordName": "Record1",
>>              "newFieldName2": null
>>          }
>>      ]
>> }
>>
>> As you can see, newFieldName1 is aliased to oldFieldName1 and has the
>> value from oldFieldName1, so its alias is working.
>>
>> However, newFieldName2 is aliased to oldFieldName2, but it is null
>> instead of having the value from oldFieldName2, so its alias is not
>> working.
>>
>> The only difference I see between newFieldName1 and newFieldName2 is
>> that newFieldName2 is a field within an array item. Do aliases not
>> work for fields in array items? Or is there some other issue?
>>
>> Any idea how can I get the alias for newFieldName2 to work?
>>
>> Thanks,
>> Spencer

Reply via email to