Re: Avro schema doesn't honor backward compatibilty

Raghvendra Singh Tue, 02 Feb 2016 13:49:16 -0800

Great, Thank you very much guys, this works. Very much appreciated.

On Tue, Feb 2, 2016 at 12:46 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:


> Raghvendra,
>
> You need to use
>
> *DatumReader<ControllerPayload> payloadReader = new
> SpecificDatumReader<>(SCHEMA_V1, **SCHEMA_V2**)*
>
> So you provide both writer(SCHEMA_V1) and reader(SCHMEA_V2) to avro. In
> your current case avro is assuming both to be the same which is certainly
> not the case and hence it is failing. I think this is what Ryan was
> referring to as well.
>
> Hope that helps.
>
>
>
> On Tue, Feb 2, 2016 at 1:44 PM, Raghvendra Singh <rsi...@appdynamics.com>
> wrote:
>
>> Hi Ryan
>>
>> Thanks for your answer. Here is what i am doing in my environment
>>
>> 1. Write the data using the old schema
>>
>> *SpecificDatumWriter<ControllerPayload> datumWriter = new
>> SpecificDatumWriter<>(SCHEMA_V1)*
>>
>> 2. Now trying to read the data written by the old schema using the new
>> schema
>>
>> *DatumReader<ControllerPayload> payloadReader = new
>> SpecificDatumReader<>(**SCHEMA_V2**)*
>>
>> In this case *SCHEMA_V1 *is the old schema which doesn't have the field
>> while SCHEMA_V2 is the new one which has the extra field.
>>
>> Your suggestion *"You should run setSchema on your SpecificDatumReader
>> to set the schema the data was written with"*  is kind of work around
>> where i have to read the data with the schema it was written with and hence
>> this is not exactly backward compatible. Note that if i do this then i have
>> to maintain all the schemas while reading and somehow know which version
>> the data was written with and hence this will make schema evolution pretty
>> painful.
>>
>> Please let me know if i didn't understand your email correctly or their
>> is something i missed.
>>
>> -raghu
>>
>> On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <b...@cloudera.com> wrote:
>>
>>> Hi Raghvendra,
>>>
>>> It looks like the problem is that you're using the new schema in place
>>> of the schema that the data was written with.  You should run setSchema on
>>> your SpecificDatumReader to set the schema the data was written with.
>>>
>>> What's happening is that the schema you're using, the new one, has the
>>> new field so Avro assumes it is present and tries to read it. By setting
>>> the schema that the data was actually written with, the datum reader will
>>> know that it isn't present and will use your default instead. When you read
>>> data encoded with the new schema, you need to use it as the written schema
>>> instead so the datum reader knows that the field should be read.
>>>
>>> Does that make sense?
>>>
>>> rb
>>>
>>> On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
>>>
>>>> down votefavorite
>>>> <
>>>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#
>>>> >
>>>>
>>>>
>>>> I have this avro schema
>>>>
>>>> {
>>>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>>   "type": "record",
>>>>   "name": "MyPayLoad",
>>>>   "fields": [
>>>>       {"name": "filed1",  "type": "string"},
>>>>       {"name": "filed2",     "type": "long"},
>>>>       {"name": "filed3",  "type": "boolean"},
>>>>       {
>>>>            "name" : "metrics",
>>>>            "type":
>>>>            {
>>>>               "type" : "array",
>>>>               "items":
>>>>               {
>>>>                   "name": "MyRecord",
>>>>                   "type": "record",
>>>>                   "fields" :
>>>>                       [
>>>>                         {"name": "min", "type": "long"},
>>>>                         {"name": "max", "type": "long"},
>>>>                         {"name": "sum", "type": "long"},
>>>>                         {"name": "count", "type": "long"}
>>>>                       ]
>>>>               }
>>>>            }
>>>>       }
>>>>    ]}
>>>>
>>>> Here is the code which we use to parse the data
>>>>
>>>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>>>          DatumReader<MyPayLoad> payloadReader = new
>>>> SpecificDatumReader<>(MyPayLoad.class);
>>>>          Decoder decoder = DecoderFactory.get().binaryDecoder(payload,
>>>> null);
>>>>          MyPayLoad myPayLoad = null;
>>>>          try {
>>>>              myPayLoad = payloadReader.read(null, decoder);
>>>>          } catch (IOException e) {
>>>>              logger.log(Level.SEVERE, e.getMessage(), e);
>>>>          }
>>>>
>>>>          return myPayLoad;
>>>>      }
>>>>
>>>> Now i want to add one more field int the schema so the schema looks like
>>>> below
>>>>
>>>>   {
>>>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>>   "type": "record",
>>>>   "name": "MyPayLoad",
>>>>   "fields": [
>>>>       {"name": "filed1",  "type": "string"},
>>>>       {"name": "filed2",     "type": "long"},
>>>>       {"name": "filed3",  "type": "boolean"},
>>>>       {
>>>>            "name" : "metrics",
>>>>            "type":
>>>>            {
>>>>               "type" : "array",
>>>>               "items":
>>>>               {
>>>>                   "name": "MyRecord",
>>>>                   "type": "record",
>>>>                   "fields" :
>>>>                       [
>>>>                         {"name": "min", "type": "long"},
>>>>                         {"name": "max", "type": "long"},
>>>>                         {"name": "sum", "type": "long"},
>>>>                         {"name": "count", "type": "long"}
>>>>                       ]
>>>>               }
>>>>            }
>>>>       }
>>>>       {"name": "agentType",  "type": ["null", "string"], "default":
>>>> "APP_AGENT"}
>>>>    ]}
>>>>
>>>> Note the filed added and also the default is defined. The problem is
>>>> that
>>>> if we receive the data which was written using the older schema i get
>>>> this
>>>> error
>>>>
>>>> java.io.EOFException: null
>>>>      at org.apache.avro.io
>>>> .BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .BinaryDecoder.readInt(BinaryDecoder.java:128)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .BinaryDecoder.readIndex(BinaryDecoder.java:423)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
>>>> ~[blitz-shared.jar:na]
>>>>
>>>> What i understood from this
>>>> <
>>>> https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>>>> >
>>>> document
>>>> that this should have been backward compatible but somehow that doesn't
>>>> seem to be the case. Any idea what i am doing wrong?
>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Cloudera, Inc.
>>>
>>
>>
>
>
> --
> Swarnim
>

Re: Avro schema doesn't honor backward compatibilty

Reply via email to