Re: Avro schema doesn't honor backward compatibilty

Ryan Blue Tue, 02 Feb 2016 09:21:01 -0800

Hi Raghvendra,

It looks like the problem is that you're using the new schema in placeof the schema that the data was written with. You should run setSchemaon your SpecificDatumReader to set the schema the data was written with.

What's happening is that the schema you're using, the new one, has thenew field so Avro assumes it is present and tries to read it. By settingthe schema that the data was actually written with, the datum readerwill know that it isn't present and will use your default instead. Whenyou read data encoded with the new schema, you need to use it as thewritten schema instead so the datum reader knows that the field shouldbe read.


Does that make sense?

rb

On 02/01/2016 12:31 PM, Raghvendra Singh wrote:

down votefavorite
<http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>

I have this avro schema

{
  "namespace": "xx.xxxx.xxxxx.xxxxx",
  "type": "record",
  "name": "MyPayLoad",
  "fields": [
      {"name": "filed1",  "type": "string"},
      {"name": "filed2",     "type": "long"},
      {"name": "filed3",  "type": "boolean"},
      {
           "name" : "metrics",
           "type":
           {
              "type" : "array",
              "items":
              {
                  "name": "MyRecord",
                  "type": "record",
                  "fields" :
                      [
                        {"name": "min", "type": "long"},
                        {"name": "max", "type": "long"},
                        {"name": "sum", "type": "long"},
                        {"name": "count", "type": "long"}
                      ]
              }
           }
      }
   ]}

Here is the code which we use to parse the data

public static final MyPayLoad parseBinaryPayload(byte[] payload) {
         DatumReader<MyPayLoad> payloadReader = new
SpecificDatumReader<>(MyPayLoad.class);
         Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
         MyPayLoad myPayLoad = null;
         try {
             myPayLoad = payloadReader.read(null, decoder);
         } catch (IOException e) {
             logger.log(Level.SEVERE, e.getMessage(), e);
         }

         return myPayLoad;
     }

Now i want to add one more field int the schema so the schema looks like
below

  {
  "namespace": "xx.xxxx.xxxxx.xxxxx",
  "type": "record",
  "name": "MyPayLoad",
  "fields": [
      {"name": "filed1",  "type": "string"},
      {"name": "filed2",     "type": "long"},
      {"name": "filed3",  "type": "boolean"},
      {
           "name" : "metrics",
           "type":
           {
              "type" : "array",
              "items":
              {
                  "name": "MyRecord",
                  "type": "record",
                  "fields" :
                      [
                        {"name": "min", "type": "long"},
                        {"name": "max", "type": "long"},
                        {"name": "sum", "type": "long"},
                        {"name": "count", "type": "long"}
                      ]
              }
           }
      }
      {"name": "agentType",  "type": ["null", "string"], "default": "APP_AGENT"}
   ]}

Note the filed added and also the default is defined. The problem is that
if we receive the data which was written using the older schema i get this
error

java.io.EOFException: null
     at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
~[avro-1.7.4.jar:1.7.4]
     at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
~[avro-1.7.4.jar:1.7.4]
     at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
~[avro-1.7.4.jar:1.7.4]
     at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
~[avro-1.7.4.jar:1.7.4]
     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
~[avro-1.7.4.jar:1.7.4]
     at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
~[avro-1.7.4.jar:1.7.4]
     at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
~[avro-1.7.4.jar:1.7.4]
     at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
~[avro-1.7.4.jar:1.7.4]
     at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
~[avro-1.7.4.jar:1.7.4]
     at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
~[avro-1.7.4.jar:1.7.4]
     at 
com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
~[blitz-shared.jar:na]

What i understood from this
<https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html>
document
that this should have been backward compatible but somehow that doesn't
seem to be the case. Any idea what i am doing wrong?



--
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: Avro schema doesn't honor backward compatibilty

Reply via email to