Thank you very much for in depth answer. I understand how it works now better, will test it shortly. Thank you for your time.
Martin. út 30. 7. 2019 v 17:09 odesílatel Ryan Skraba <r...@skraba.com> napsal: > Hello! It's the same issue in your example code as allegro, even with > the SpecificDatumReader. > > This line: datumReader = new SpecificDatumReader<>(schema) > should be: datumReader = new SpecificDatumReader<>(originalSchema, schema) > > In Avro, the original schema is commonly known as the writer schema > (the instance that originally wrote the binary data). Schema > evolution applies when you are using the constructor of the > SpecificDatumReader that takes *both* reader and writer schemas. > > As a concrete example, if your original schema was: > > { > "type": "record", > "name": "Simple", > "fields": [ > {"name": "id", "type": "int"}, > {"name": "name","type": "string"} > ] > } > > And you added a field: > > { > "type": "record", > "name": "SimpleV2", > "fields": [ > {"name": "id", "type": "int"}, > {"name": "name", "type": "string"}, > {"name": "description","type": ["null", "string"]} > ] > } > > You could do the following safely, assuming that Simple and SimpleV2 > classes are generated from the avro-maven-plugin: > > @Test > public void testSerializeDeserializeEvolution() throws IOException { > // Write a Simple v1 to bytes using your exact method. > byte[] v1AsBytes = serialize(new Simple(1, "name1"), true, false); > > // Read as Simple v2, same as your method but with the writer and > reader schema. > DatumReader<SimpleV2> datumReader = > new SpecificDatumReader<>(Simple.getClassSchema(), > SimpleV2.getClassSchema()); > Decoder decoder = DecoderFactory.get().binaryDecoder(v1AsBytes, null); > SimpleV2 v2 = datumReader.read(null, decoder); > > assertThat(v2.getId(), is(1)); > assertThat(v2.getName(), is(new Utf8("name1"))); > assertThat(v2.getDescription(), nullValue()); > } > > This demonstrates with two different schemas and SpecificRecords in > the same test, but the same principle applies if it's the same record > that has evolved -- you need to know the original schema that wrote > the data in order to apply the schema that you're now using for > reading. > > I hope this clarifies what you are looking for! > > All my best, Ryan > > > > On Tue, Jul 30, 2019 at 3:30 PM Martin Mucha <alfon...@gmail.com> wrote: > > > > Thanks for answer. > > > > Actually I have exactly the same behavior with avro 1.9.0 and following > deserializer in our other app, which uses strictly avro codebase, and > failing with same exceptions. So lets leave "allegro" library and lots of > other tools out of it in our discussion. > > I can use whichever aproach. All I need is single way, where I can > deserialize byte[] into class generated by avro-maven-plugin, and which > will respect documentation regarding schema evolution. Currently we're > using following deserializer and serializer, and these does not work when > it comes to schema evolution. What is the correct way to serialize and > deserializer avro data? > > > > I probably don't understand your mention about GenericRecord or > GenericDatumReader. I tried to use GenericDatumReader in deserializer > below, but then it seems I got back just GenericData$Record instance, which > I can use then to access array of instances, which is not what I'm looking > for(IIUC), since in that case I could have just use plain old JSON and > deserialize it using jackson having no schema evolution problems at all. If > that's correct, I'd rather stick to SpecificDatumReader, and somehow fix it > if possible. > > > > What can be done? Or how schema evolution is intended to be used? I > found a lots of question searching for this answer. > > > > thanks! > > Martin. > > > > deserializer: > > > > public static <T extends SpecificRecordBase> T deserialize(Class<T> > targetType, > > byte[] > data, > > boolean > useBinaryDecoder) { > > try { > > if (data == null) { > > return null; > > } > > > > log.trace("data='{}'", > DatatypeConverter.printHexBinary(data)); > > > > Schema schema = targetType.newInstance().getSchema(); > > DatumReader<GenericRecord> datumReader = new > SpecificDatumReader<>(schema); > > Decoder decoder = useBinaryDecoder > > ? DecoderFactory.get().binaryDecoder(data, null) > > : DecoderFactory.get().jsonDecoder(schema, new > String(data)); > > > > T result = targetType.cast(datumReader.read(null, decoder)); > > log.trace("deserialized data='{}'", result); > > return result; > > } catch (Exception ex) { > > throw new SerializationException("Error deserializing data", > ex); > > } > > } > > > > serializer: > > public static <T extends SpecificRecordBase> byte[] serialize(T data, > boolean useBinaryDecoder, boolean pretty) { > > try { > > if (data == null) { > > return new byte[0]; > > } > > > > log.debug("data='{}'", data); > > Schema schema = data.getSchema(); > > ByteArrayOutputStream byteArrayOutputStream = new > ByteArrayOutputStream(); > > Encoder binaryEncoder = useBinaryDecoder > > ? > EncoderFactory.get().binaryEncoder(byteArrayOutputStream, null) > > : EncoderFactory.get().jsonEncoder(schema, > byteArrayOutputStream, pretty); > > > > DatumWriter<GenericRecord> datumWriter = new > GenericDatumWriter<>(schema); > > datumWriter.write(data, binaryEncoder); > > > > binaryEncoder.flush(); > > byteArrayOutputStream.close(); > > > > byte[] result = byteArrayOutputStream.toByteArray(); > > log.debug("serialized data='{}'", > DatatypeConverter.printHexBinary(result)); > > return result; > > } catch (IOException ex) { > > throw new SerializationException( > > "Can't serialize data='" + data, ex); > > } > > } > > > > út 30. 7. 2019 v 13:48 odesílatel Ryan Skraba <r...@skraba.com> napsal: > >> > >> Hello! Schema evolution relies on both the writer and reader schemas > >> being available. > >> > >> It looks like the allegro tool you are using is using the > >> GenericDatumReader that assumes the reader and writer schema are the > >> same: > >> > >> > https://github.com/allegro/json-avro-converter/blob/json-avro-converter-0.2.8/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonAvroConverter.java#L83 > >> > >> I do not believe that the "default" value is taken into account for > >> data that is strictly missing from the binary input, just when a field > >> is known to be in the reader schema but missing from the original > >> writer. > >> > >> You may have more luck reading the GenericRecord with a > >> GenericDatumReader with both schemas, and using the > >> `convertToJson(record)`. > >> > >> I hope this is useful -- Ryan > >> > >> > >> > >> On Tue, Jul 30, 2019 at 10:20 AM Martin Mucha <alfon...@gmail.com> > wrote: > >> > > >> > Hi, > >> > > >> > I've got some issues/misunderstanding of AVRO schema evolution. > >> > > >> > When reading through avro documentation, for example [1], I > understood, that schema evolution is supported, and if I added column with > specified default, it should be backwards compatible (and even forward when > I remove it again). Sounds great, so I added column defined as: > >> > > >> > { > >> > "name": "newColumn", > >> > "type": ["null","string"], > >> > "default": null, > >> > "doc": "something wrong" > >> > } > >> > > >> > and try to consumer some topic having this schema from beginning, it > fails with message: > >> > > >> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 5 > >> > at > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:424) > >> > at org.apache.avro.io > .ResolvingDecoder.doAction(ResolvingDecoder.java:290) > >> > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > >> > at org.apache.avro.io > .ResolvingDecoder.readIndex(ResolvingDecoder.java:267) > >> > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > >> > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > >> > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) > >> > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > >> > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > >> > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > >> > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > >> > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > >> > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) > >> > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > >> > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > >> > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > >> > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) > >> > at > tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83) > >> > to give a little bit more information. Avro schema defines one top > level type, having 2 fields. String describing type of message, and union > of N types. All N-1, non-modified types can be read, but one updated with > optional, default-having column cannot be read. I'm not sure if this design > is strictly speaking correct, but that's not the point (feel free to > criticise and recommend better approach!). I'm after schema evolution, > which seems not to be working. > >> > > >> > > >> > And if we alter type definition to: > >> > > >> > "type": "string", > >> > "default": "" > >> > it still does not work and generated error is: > >> > > >> > Caused by: org.apache.avro.AvroRuntimeException: Malformed data. > Length is negative: -1 > >> > at org.apache.avro.io > .BinaryDecoder.doReadBytes(BinaryDecoder.java:336) > >> > at org.apache.avro.io > .BinaryDecoder.readString(BinaryDecoder.java:263) > >> > at org.apache.avro.io > .ResolvingDecoder.readString(ResolvingDecoder.java:201) > >> > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422) > >> > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414) > >> > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181) > >> > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > >> > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) > >> > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > >> > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > >> > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > >> > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > >> > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > >> > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) > >> > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > >> > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > >> > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > >> > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) > >> > at > tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToJson(JsonAvroConverter.java:83) > >> > > >> > Am I doing something wrong? > >> > > >> > thanks, > >> > Martin. > >> > > >> > [1] > https://docs.oracle.com/database/nosql-12.1.3.4/GettingStartedGuide/schemaevolution.html#changeschema-rules >