Re: Avro schema doesn't honor backward compatibilty

kulkarni.swar...@gmail.com Tue, 02 Feb 2016 12:53:28 -0800

Raghvendra,

You need to use


*DatumReader<ControllerPayload> payloadReader = new
SpecificDatumReader<>(SCHEMA_V1, **SCHEMA_V2**)*

So you provide both writer(SCHEMA_V1) and reader(SCHMEA_V2) to avro. In
your current case avro is assuming both to be the same which is certainly
not the case and hence it is failing. I think this is what Ryan was
referring to as well.

Hope that helps.



On Tue, Feb 2, 2016 at 1:44 PM, Raghvendra Singh <rsi...@appdynamics.com>
wrote:

> Hi Ryan
>
> Thanks for your answer. Here is what i am doing in my environment
>
> 1. Write the data using the old schema
>
> *SpecificDatumWriter<ControllerPayload> datumWriter = new
> SpecificDatumWriter<>(SCHEMA_V1)*
>
> 2. Now trying to read the data written by the old schema using the new
> schema
>
> *DatumReader<ControllerPayload> payloadReader = new SpecificDatumReader<>(*
> *SCHEMA_V2**)*
>
> In this case *SCHEMA_V1 *is the old schema which doesn't have the field
> while SCHEMA_V2 is the new one which has the extra field.
>
> Your suggestion *"You should run setSchema on your SpecificDatumReader to
> set the schema the data was written with"*  is kind of work around where
> i have to read the data with the schema it was written with and hence this
> is not exactly backward compatible. Note that if i do this then i have to
> maintain all the schemas while reading and somehow know which version the
> data was written with and hence this will make schema evolution pretty
> painful.
>
> Please let me know if i didn't understand your email correctly or their is
> something i missed.
>
> -raghu
>
> On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <b...@cloudera.com> wrote:
>
>> Hi Raghvendra,
>>
>> It looks like the problem is that you're using the new schema in place of
>> the schema that the data was written with.  You should run setSchema on
>> your SpecificDatumReader to set the schema the data was written with.
>>
>> What's happening is that the schema you're using, the new one, has the
>> new field so Avro assumes it is present and tries to read it. By setting
>> the schema that the data was actually written with, the datum reader will
>> know that it isn't present and will use your default instead. When you read
>> data encoded with the new schema, you need to use it as the written schema
>> instead so the datum reader knows that the field should be read.
>>
>> Does that make sense?
>>
>> rb
>>
>> On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
>>
>>> down votefavorite
>>> <
>>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#
>>> >
>>>
>>>
>>> I have this avro schema
>>>
>>> {
>>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>   "type": "record",
>>>   "name": "MyPayLoad",
>>>   "fields": [
>>>       {"name": "filed1",  "type": "string"},
>>>       {"name": "filed2",     "type": "long"},
>>>       {"name": "filed3",  "type": "boolean"},
>>>       {
>>>            "name" : "metrics",
>>>            "type":
>>>            {
>>>               "type" : "array",
>>>               "items":
>>>               {
>>>                   "name": "MyRecord",
>>>                   "type": "record",
>>>                   "fields" :
>>>                       [
>>>                         {"name": "min", "type": "long"},
>>>                         {"name": "max", "type": "long"},
>>>                         {"name": "sum", "type": "long"},
>>>                         {"name": "count", "type": "long"}
>>>                       ]
>>>               }
>>>            }
>>>       }
>>>    ]}
>>>
>>> Here is the code which we use to parse the data
>>>
>>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>>          DatumReader<MyPayLoad> payloadReader = new
>>> SpecificDatumReader<>(MyPayLoad.class);
>>>          Decoder decoder = DecoderFactory.get().binaryDecoder(payload,
>>> null);
>>>          MyPayLoad myPayLoad = null;
>>>          try {
>>>              myPayLoad = payloadReader.read(null, decoder);
>>>          } catch (IOException e) {
>>>              logger.log(Level.SEVERE, e.getMessage(), e);
>>>          }
>>>
>>>          return myPayLoad;
>>>      }
>>>
>>> Now i want to add one more field int the schema so the schema looks like
>>> below
>>>
>>>   {
>>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>   "type": "record",
>>>   "name": "MyPayLoad",
>>>   "fields": [
>>>       {"name": "filed1",  "type": "string"},
>>>       {"name": "filed2",     "type": "long"},
>>>       {"name": "filed3",  "type": "boolean"},
>>>       {
>>>            "name" : "metrics",
>>>            "type":
>>>            {
>>>               "type" : "array",
>>>               "items":
>>>               {
>>>                   "name": "MyRecord",
>>>                   "type": "record",
>>>                   "fields" :
>>>                       [
>>>                         {"name": "min", "type": "long"},
>>>                         {"name": "max", "type": "long"},
>>>                         {"name": "sum", "type": "long"},
>>>                         {"name": "count", "type": "long"}
>>>                       ]
>>>               }
>>>            }
>>>       }
>>>       {"name": "agentType",  "type": ["null", "string"], "default":
>>> "APP_AGENT"}
>>>    ]}
>>>
>>> Note the filed added and also the default is defined. The problem is that
>>> if we receive the data which was written using the older schema i get
>>> this
>>> error
>>>
>>> java.io.EOFException: null
>>>      at org.apache.avro.io
>>> .BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at org.apache.avro.io
>>> .BinaryDecoder.readIndex(BinaryDecoder.java:423)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at org.apache.avro.io
>>> .ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at org.apache.avro.io
>>> .ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at
>>> com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
>>> ~[blitz-shared.jar:na]
>>>
>>> What i understood from this
>>> <
>>> https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>>> >
>>> document
>>> that this should have been backward compatible but somehow that doesn't
>>> seem to be the case. Any idea what i am doing wrong?
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Cloudera, Inc.
>>
>
>


-- 
Swarnim

Re: Avro schema doesn't honor backward compatibilty

Reply via email to