Thanks for answer. Actually I have exactly the same behavior with avro 1.9.0 and following deserializer in our other app, which uses strictly avro codebase, and failing with same exceptions. So lets leave "allegro" library and lots of other tools out of it in our discussion. I can use whichever aproach. All I need is single way, where I can deserialize byte[] into class generated by avro-maven-plugin, and which will respect documentation regarding schema evolution. Currently we're using following deserializer and serializer, and these does not work when it comes to schema evolution. What is the correct way to serialize and deserializer avro data?
I probably don't understand your mention about GenericRecord or GenericDatumReader. I tried to use GenericDatumReader in deserializer below, but then it seems I got back just GenericData$Record instance, which I can use then to access array of instances, which is not what I'm looking for(IIUC), since in that case I could have just use plain old JSON and deserialize it using jackson having no schema evolution problems at all. If that's correct, I'd rather stick to SpecificDatumReader, and somehow fix it if possible. What can be done? Or how schema evolution is intended to be used? I found a lots of question searching for this answer. thanks! Martin. deserializer: public static <T extends SpecificRecordBase> T deserialize(Class<T> targetType, byte[] data, boolean useBinaryDecoder) { try { if (data == null) { return null; } log.trace("data='{}'", DatatypeConverter.printHexBinary(data)); Schema schema = targetType.newInstance().getSchema(); DatumReader<GenericRecord> datumReader = new SpecificDatumReader<>(schema); Decoder decoder = useBinaryDecoder ? DecoderFactory.get().binaryDecoder(data, null) : DecoderFactory.get().jsonDecoder(schema, new String(data)); T result = targetType.cast(datumReader.read(null, decoder)); log.trace("deserialized data='{}'", result); return result; } catch (Exception ex) { throw new SerializationException("Error deserializing data", ex); } } serializer: public static <T extends SpecificRecordBase> byte[] serialize(T data, boolean useBinaryDecoder, boolean pretty) { try { if (data == null) { return new byte[0]; } log.debug("data='{}'", data); Schema schema = data.getSchema(); ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); Encoder binaryEncoder = useBinaryDecoder ? EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null) : EncoderFactory.get().jsonEncoder(schema, byteArrayOutputStream, pretty); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema); datumWriter.write(data, binaryEncoder); binaryEncoder.flush(); byteArrayOutputStream.close(); byte[] result = byteArrayOutputStream.toByteArray(); log.debug("serialized data='{}'", DatatypeConverter.printHexBinary(result)); return result; } catch (IOException ex) { throw new SerializationException( "Can't serialize data='" + data, ex); } } Ășt 30. 7. 2019 v 13:48 odesĂlatel Ryan Skraba <r...@skraba.com> napsal: > Hello! Schema evolution relies on both the writer and reader schemas > being available. > > It looks like the allegro tool you are using is using the > GenericDatumReader that assumes the reader and writer schema are the > same: > > > https://github.com/allegro/json-avro-converter/blob/json-avro-converter-0.2.8/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonAvroConverter.java#L83 > > I do not believe that the "default" value is taken into account for > data that is strictly missing from the binary input, just when a field > is known to be in the reader schema but missing from the original > writer. > > You may have more luck reading the GenericRecord with a > GenericDatumReader with both schemas, and using the > `convertToJson(record)`. > > I hope this is useful -- Ryan > > > > On Tue, Jul 30, 2019 at 10:20 AM Martin Mucha <alfon...@gmail.com> wrote: > > > > Hi, > > > > I've got some issues/misunderstanding of AVRO schema evolution. > > > > When reading through avro documentation, for example [1], I understood, > that schema evolution is supported, and if I added column with specified > default, it should be backwards compatible (and even forward when I remove > it again). Sounds great, so I added column defined as: > > > > { > > "name": "newColumn", > > "type": ["null","string"], > > "default": null, > > "doc": "something wrong" > > } > > > > and try to consumer some topic having this schema from beginning, it > fails with message: > > > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 5 > > at > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:424) > > at org.apache.avro.io > .ResolvingDecoder.doAction(ResolvingDecoder.java:290) > > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > > at org.apache.avro.io > .ResolvingDecoder.readIndex(ResolvingDecoder.java:267) > > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) > > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) > > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) > > at > tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83) > > to give a little bit more information. Avro schema defines one top level > type, having 2 fields. String describing type of message, and union of N > types. All N-1, non-modified types can be read, but one updated with > optional, default-having column cannot be read. I'm not sure if this design > is strictly speaking correct, but that's not the point (feel free to > criticise and recommend better approach!). I'm after schema evolution, > which seems not to be working. > > > > > > And if we alter type definition to: > > > > "type": "string", > > "default": "" > > it still does not work and generated error is: > > > > Caused by: org.apache.avro.AvroRuntimeException: Malformed data. Length > is negative: -1 > > at org.apache.avro.io > .BinaryDecoder.doReadBytes(BinaryDecoder.java:336) > > at org.apache.avro.io > .BinaryDecoder.readString(BinaryDecoder.java:263) > > at org.apache.avro.io > .ResolvingDecoder.readString(ResolvingDecoder.java:201) > > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422) > > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414) > > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181) > > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) > > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) > > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) > > at > tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83) > > > > Am I doing something wrong? > > > > thanks, > > Martin. > > > > [1] > https://docs.oracle.com/database/nosql-12.1.3.4/GettingStartedGuide/schemaevolution.html#changeschema-rules >