Raghvendra, You need to use
*DatumReader<ControllerPayload> payloadReader = new SpecificDatumReader<>(SCHEMA_V1, **SCHEMA_V2**)* So you provide both writer(SCHEMA_V1) and reader(SCHMEA_V2) to avro. In your current case avro is assuming both to be the same which is certainly not the case and hence it is failing. I think this is what Ryan was referring to as well. Hope that helps. On Tue, Feb 2, 2016 at 1:44 PM, Raghvendra Singh <rsi...@appdynamics.com> wrote: > Hi Ryan > > Thanks for your answer. Here is what i am doing in my environment > > 1. Write the data using the old schema > > *SpecificDatumWriter<ControllerPayload> datumWriter = new > SpecificDatumWriter<>(SCHEMA_V1)* > > 2. Now trying to read the data written by the old schema using the new > schema > > *DatumReader<ControllerPayload> payloadReader = new SpecificDatumReader<>(* > *SCHEMA_V2**)* > > In this case *SCHEMA_V1 *is the old schema which doesn't have the field > while SCHEMA_V2 is the new one which has the extra field. > > Your suggestion *"You should run setSchema on your SpecificDatumReader to > set the schema the data was written with"* is kind of work around where > i have to read the data with the schema it was written with and hence this > is not exactly backward compatible. Note that if i do this then i have to > maintain all the schemas while reading and somehow know which version the > data was written with and hence this will make schema evolution pretty > painful. > > Please let me know if i didn't understand your email correctly or their is > something i missed. > > -raghu > > On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <b...@cloudera.com> wrote: > >> Hi Raghvendra, >> >> It looks like the problem is that you're using the new schema in place of >> the schema that the data was written with. You should run setSchema on >> your SpecificDatumReader to set the schema the data was written with. >> >> What's happening is that the schema you're using, the new one, has the >> new field so Avro assumes it is present and tries to read it. By setting >> the schema that the data was actually written with, the datum reader will >> know that it isn't present and will use your default instead. When you read >> data encoded with the new schema, you need to use it as the written schema >> instead so the datum reader knows that the field should be read. >> >> Does that make sense? >> >> rb >> >> On 02/01/2016 12:31 PM, Raghvendra Singh wrote: >> >>> down votefavorite >>> < >>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty# >>> > >>> >>> >>> I have this avro schema >>> >>> { >>> "namespace": "xx.xxxx.xxxxx.xxxxx", >>> "type": "record", >>> "name": "MyPayLoad", >>> "fields": [ >>> {"name": "filed1", "type": "string"}, >>> {"name": "filed2", "type": "long"}, >>> {"name": "filed3", "type": "boolean"}, >>> { >>> "name" : "metrics", >>> "type": >>> { >>> "type" : "array", >>> "items": >>> { >>> "name": "MyRecord", >>> "type": "record", >>> "fields" : >>> [ >>> {"name": "min", "type": "long"}, >>> {"name": "max", "type": "long"}, >>> {"name": "sum", "type": "long"}, >>> {"name": "count", "type": "long"} >>> ] >>> } >>> } >>> } >>> ]} >>> >>> Here is the code which we use to parse the data >>> >>> public static final MyPayLoad parseBinaryPayload(byte[] payload) { >>> DatumReader<MyPayLoad> payloadReader = new >>> SpecificDatumReader<>(MyPayLoad.class); >>> Decoder decoder = DecoderFactory.get().binaryDecoder(payload, >>> null); >>> MyPayLoad myPayLoad = null; >>> try { >>> myPayLoad = payloadReader.read(null, decoder); >>> } catch (IOException e) { >>> logger.log(Level.SEVERE, e.getMessage(), e); >>> } >>> >>> return myPayLoad; >>> } >>> >>> Now i want to add one more field int the schema so the schema looks like >>> below >>> >>> { >>> "namespace": "xx.xxxx.xxxxx.xxxxx", >>> "type": "record", >>> "name": "MyPayLoad", >>> "fields": [ >>> {"name": "filed1", "type": "string"}, >>> {"name": "filed2", "type": "long"}, >>> {"name": "filed3", "type": "boolean"}, >>> { >>> "name" : "metrics", >>> "type": >>> { >>> "type" : "array", >>> "items": >>> { >>> "name": "MyRecord", >>> "type": "record", >>> "fields" : >>> [ >>> {"name": "min", "type": "long"}, >>> {"name": "max", "type": "long"}, >>> {"name": "sum", "type": "long"}, >>> {"name": "count", "type": "long"} >>> ] >>> } >>> } >>> } >>> {"name": "agentType", "type": ["null", "string"], "default": >>> "APP_AGENT"} >>> ]} >>> >>> Note the filed added and also the default is defined. The problem is that >>> if we receive the data which was written using the older schema i get >>> this >>> error >>> >>> java.io.EOFException: null >>> at org.apache.avro.io >>> .BinaryDecoder.ensureBounds(BinaryDecoder.java:473) >>> ~[avro-1.7.4.jar:1.7.4] >>> at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) >>> ~[avro-1.7.4.jar:1.7.4] >>> at org.apache.avro.io >>> .BinaryDecoder.readIndex(BinaryDecoder.java:423) >>> ~[avro-1.7.4.jar:1.7.4] >>> at org.apache.avro.io >>> .ResolvingDecoder.doAction(ResolvingDecoder.java:229) >>> ~[avro-1.7.4.jar:1.7.4] >>> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) >>> ~[avro-1.7.4.jar:1.7.4] >>> at org.apache.avro.io >>> .ResolvingDecoder.readIndex(ResolvingDecoder.java:206) >>> ~[avro-1.7.4.jar:1.7.4] >>> at >>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) >>> ~[avro-1.7.4.jar:1.7.4] >>> at >>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) >>> ~[avro-1.7.4.jar:1.7.4] >>> at >>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) >>> ~[avro-1.7.4.jar:1.7.4] >>> at >>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) >>> ~[avro-1.7.4.jar:1.7.4] >>> at >>> com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38) >>> ~[blitz-shared.jar:na] >>> >>> What i understood from this >>> < >>> https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html >>> > >>> document >>> that this should have been backward compatible but somehow that doesn't >>> seem to be the case. Any idea what i am doing wrong? >>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Cloudera, Inc. >> > > -- Swarnim