schema evolution with top-level union.

Martin Mucha Thu, 14 Nov 2019 11:50:56 -0800

Hi, I encounter weird behavior and have no idea how to fix that. Any
suggestions welcomed.


The issue revolves around union type on top level, which I personally
dislike and consider to be hack, but I understand the motivation behind it:
someone wanted to declare N types withing single avsc file (probably). The
drawback is, that this thing does not support avro schema evolution (read
on). If there is possibility to reshape that avsc, so that multiple files
are somehow available on top-level and evolution works, I'm listening.

Now the code:

old version of schema:

{
  "namespace": "test",
  "name": "TestAvro",
  "type": "record",
  "fields": [
    {
      "name": "a",
      "type": "string"
    }
  ]
}

updated version of schema, to which former should evolve:

{
  "namespace": "test",
  "name": "TestAvro",
  "type": "record",
  "fields": [
    {
      "name": "a",
      "type": "string"
    },
    {
      "name": "b",
      "type": ["null", "string"],
      "default": null
    }
  ]
}


serialization:


private <T extends SpecificRecordBase> byte[] serialize(final T data) {
    try (ByteArrayOutputStream byteArrayOutputStream = new
ByteArrayOutputStream()) {
        Encoder binaryEncoder =
EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null);
        DatumWriter<T> datumWriter = new
SpecificDatumWriter<>(data.getSchema());
        datumWriter.write(data, binaryEncoder);
        binaryEncoder.flush();

        return byteArrayOutputStream.toByteArray();
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}


deserialization:

private static <T extends SpecificRecordBase> T
deserializeUsingSchemaEvolution(Class<T> targetType,

         Schema readerSchema,

         Schema writerSchema,

         byte[] data) {
    try {
        if (data == null) {
            return null;
        }

        DatumReader<GenericRecord> datumReader = new
SpecificDatumReader<>(writerSchema, readerSchema);
        Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);

        return targetType.cast(datumReader.read(null, decoder));
    } catch (Exception ex) {
        throw new SerializationException("Error deserializing data", ex);
    }
}


--> this WORKS. Data will be serialized and deserialized, evolution
works as intended.

now put square brackets to both avsc files, ie. add first and last
character to those files, so that first char in those schemata is [
and ] is the last char.


After that, deserialization won't work at all. Errors being produced
vary wildly, depending on avro version and schemata. One can encounter
simple "cannot be deserialized" errors, utf8 cannot be casted to
string errors, or even X cannot be casted to Y, where X and Y are
random types from top-level union, and where such casting makes no
sense.


Any suggestions would be greatly appreciated, as I inherited those
schemata with top-level unions and really don't have any idea how to
make them work.

Thanks,

M.

schema evolution with top-level union.

Reply via email to