Re: AVRO schema evolution: adding optional column with default fails deserialization

Martin Mucha Tue, 30 Jul 2019 06:30:27 -0700

Thanks for answer.

Actually I have exactly the same behavior with avro 1.9.0 and following
deserializer in our other app, which uses strictly avro codebase, and
failing with same exceptions. So lets leave "allegro" library and lots of
other tools out of it in our discussion.
I can use whichever aproach. All I need is single way, where I can
deserialize byte[] into class generated by avro-maven-plugin, and which
will respect documentation regarding schema evolution. Currently we're
using following deserializer and serializer, and these does not work when
it comes to schema evolution. What is the correct way to serialize and
deserializer avro data?


I probably don't understand your mention about GenericRecord or
GenericDatumReader. I tried to use GenericDatumReader in deserializer
below, but then it seems I got back just GenericData$Record instance, which
I can use then to access array of instances, which is not what I'm looking
for(IIUC), since in that case I could have just use plain old JSON and
deserialize it using jackson having no schema evolution problems at all. If
that's correct, I'd rather stick to SpecificDatumReader, and somehow fix it
if possible.

What can be done? Or how schema evolution is intended to be used? I found a
lots of question searching for this answer.

thanks!
Martin.

deserializer:

public static <T extends SpecificRecordBase> T deserialize(Class<T>
targetType,
                                                               byte[] data,
                                                               boolean
useBinaryDecoder) {
        try {
            if (data == null) {
                return null;
            }

            log.trace("data='{}'", DatatypeConverter.printHexBinary(data));

            Schema schema = targetType.newInstance().getSchema();
            DatumReader<GenericRecord> datumReader = new
SpecificDatumReader<>(schema);
            Decoder decoder = useBinaryDecoder
                    ? DecoderFactory.get().binaryDecoder(data, null)
                    : DecoderFactory.get().jsonDecoder(schema, new
String(data));

            T result = targetType.cast(datumReader.read(null, decoder));
            log.trace("deserialized data='{}'", result);
            return result;
        } catch (Exception ex) {
            throw new SerializationException("Error deserializing data",
ex);
        }
    }

serializer:
public static <T extends SpecificRecordBase> byte[] serialize(T data,
boolean useBinaryDecoder, boolean pretty) {
        try {
            if (data == null) {
                return new byte[0];
            }

            log.debug("data='{}'", data);
            Schema schema = data.getSchema();
            ByteArrayOutputStream byteArrayOutputStream = new
ByteArrayOutputStream();
            Encoder binaryEncoder = useBinaryDecoder
                    ?
EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null)
                    : EncoderFactory.get().jsonEncoder(schema,
byteArrayOutputStream, pretty);

            DatumWriter<GenericRecord> datumWriter = new
GenericDatumWriter<>(schema);
            datumWriter.write(data, binaryEncoder);

            binaryEncoder.flush();
            byteArrayOutputStream.close();

            byte[] result = byteArrayOutputStream.toByteArray();
            log.debug("serialized data='{}'",
DatatypeConverter.printHexBinary(result));
            return result;
        } catch (IOException ex) {
            throw new SerializationException(
                    "Can't serialize data='" + data, ex);
        }
    }

út 30. 7. 2019 v 13:48 odesílatel Ryan Skraba <r...@skraba.com> napsal:

> Hello!  Schema evolution relies on both the writer and reader schemas
> being available.
>
> It looks like the allegro tool you are using is using the
> GenericDatumReader that assumes the reader and writer schema are the
> same:
>
>
> https://github.com/allegro/json-avro-converter/blob/json-avro-converter-0.2.8/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonAvroConverter.java#L83
>
> I do not believe that the "default" value is taken into account for
> data that is strictly missing from the binary input, just when a field
> is known to be in the reader schema but missing from the original
> writer.
>
> You may have more luck reading the GenericRecord with a
> GenericDatumReader with both schemas, and using the
> `convertToJson(record)`.
>
> I hope this is useful -- Ryan
>
>
>
> On Tue, Jul 30, 2019 at 10:20 AM Martin Mucha <alfon...@gmail.com> wrote:
> >
> > Hi,
> >
> > I've got some issues/misunderstanding of AVRO schema evolution.
> >
> > When reading through avro documentation, for example [1], I understood,
> that schema evolution is supported, and if I added column with specified
> default, it should be backwards compatible (and even forward when I remove
> it again). Sounds great, so I added column defined as:
> >
> >         {
> >           "name": "newColumn",
> >           "type": ["null","string"],
> >           "default": null,
> >           "doc": "something wrong"
> >         }
> >
> > and try to consumer some topic having this schema from beginning, it
> fails with message:
> >
> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
> >     at
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:424)
> >     at org.apache.avro.io
> .ResolvingDecoder.doAction(ResolvingDecoder.java:290)
> >     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> >     at org.apache.avro.io
> .ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
> >     at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
> >     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> >     at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
> >     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
> >     at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
> >     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> >     at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
> >     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> >     at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
> >     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
> >     at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
> >     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> >     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
> >     at
> tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83)
> > to give a little bit more information. Avro schema defines one top level
> type, having 2 fields. String describing type of message, and union of N
> types. All N-1, non-modified types can be read, but one updated with
> optional, default-having column cannot be read. I'm not sure if this design
> is strictly speaking correct, but that's not the point (feel free to
> criticise and recommend better approach!). I'm after schema evolution,
> which seems not to be working.
> >
> >
> > And if we alter type definition to:
> >
> > "type": "string",
> > "default": ""
> > it still does not work and generated error is:
> >
> > Caused by: org.apache.avro.AvroRuntimeException: Malformed data. Length
> is negative: -1
> >     at org.apache.avro.io
> .BinaryDecoder.doReadBytes(BinaryDecoder.java:336)
> >     at org.apache.avro.io
> .BinaryDecoder.readString(BinaryDecoder.java:263)
> >     at org.apache.avro.io
> .ResolvingDecoder.readString(ResolvingDecoder.java:201)
> >     at
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422)
> >     at
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414)
> >     at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181)
> >     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> >     at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
> >     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
> >     at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
> >     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> >     at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
> >     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> >     at
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
> >     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
> >     at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
> >     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> >     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
> >     at
> tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83)
> >
> > Am I doing something wrong?
> >
> > thanks,
> > Martin.
> >
> > [1]
> https://docs.oracle.com/database/nosql-12.1.3.4/GettingStartedGuide/schemaevolution.html#changeschema-rules
>

Re: AVRO schema evolution: adding optional column with default fails deserialization

Reply via email to