Hi, I encounter weird behavior and have no idea how to fix that. Any suggestions welcomed.
The issue revolves around union type on top level, which I personally dislike and consider to be hack, but I understand the motivation behind it: someone wanted to declare N types withing single avsc file (probably). The drawback is, that this thing does not support avro schema evolution (read on). If there is possibility to reshape that avsc, so that multiple files are somehow available on top-level and evolution works, I'm listening. Now the code: old version of schema: { "namespace": "test", "name": "TestAvro", "type": "record", "fields": [ { "name": "a", "type": "string" } ] } updated version of schema, to which former should evolve: { "namespace": "test", "name": "TestAvro", "type": "record", "fields": [ { "name": "a", "type": "string" }, { "name": "b", "type": ["null", "string"], "default": null } ] } serialization: private <T extends SpecificRecordBase> byte[] serialize(final T data) { try (ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream()) { Encoder binaryEncoder = EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null); DatumWriter<T> datumWriter = new SpecificDatumWriter<>(data.getSchema()); datumWriter.write(data, binaryEncoder); binaryEncoder.flush(); return byteArrayOutputStream.toByteArray(); } catch (IOException e) { throw new RuntimeException(e); } } deserialization: private static <T extends SpecificRecordBase> T deserializeUsingSchemaEvolution(Class<T> targetType, Schema readerSchema, Schema writerSchema, byte[] data) { try { if (data == null) { return null; } DatumReader<GenericRecord> datumReader = new SpecificDatumReader<>(writerSchema, readerSchema); Decoder decoder = DecoderFactory.get().binaryDecoder(data, null); return targetType.cast(datumReader.read(null, decoder)); } catch (Exception ex) { throw new SerializationException("Error deserializing data", ex); } } --> this WORKS. Data will be serialized and deserialized, evolution works as intended. now put square brackets to both avsc files, ie. add first and last character to those files, so that first char in those schemata is [ and ] is the last char. After that, deserialization won't work at all. Errors being produced vary wildly, depending on avro version and schemata. One can encounter simple "cannot be deserialized" errors, utf8 cannot be casted to string errors, or even X cannot be casted to Y, where X and Y are random types from top-level union, and where such casting makes no sense. Any suggestions would be greatly appreciated, as I inherited those schemata with top-level unions and really don't have any idea how to make them work. Thanks, M.