Ivan Zemlyanskiy created AVRO-3408:
--------------------------------------

             Summary: Schema evolution with logical types 
                 Key: AVRO-3408
                 URL: https://issues.apache.org/jira/browse/AVRO-3408
             Project: Apache Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.11.0
            Reporter: Ivan Zemlyanskiy


Hello!

First of all, thank you for this project. I love Avro encoding from both 
technology and code culture points of view. (y)

I know you recommend migrating schema by adding a new field and removing the 
old one in the future, but please-please-please consider my case as well. 

In my company, we have some DTOs, and it's about 200+ fields in total that we 
encode with Avro and send over the network. About a third of them have type 
`java.math.BigDecimal`. At some point, we discovered we send them with a schema 
like
{code:json}
{
  "name":"performancePrice",
  "type":{
    "type":"string",
    "java-class":"java.math.BigDecimal"
  }
}
{code}
That's a kind of disaster for us cos we have pretty much a high load with ~2 
million RPS. 
So we start to think about migrating to something lighter than strings (no 
blame for choosing it as a default, I know BigDecimal has a lot of pitfalls, 
and string is the easiest way for encoding/decoding).
It was fine to make a standard precision for all such fields, so we found 
`Conversions.DecimalConversion` and decided at the end of the day we were going 
to use this logical type with a recommended schema like
{code:java}
    @Override
    public Schema getRecommendedSchema() {
        Schema schema = Schema.create(Schema.Type.BYTES);
        LogicalTypes.Decimal decimalType =
                LogicalTypes.decimal(MathContext.DECIMAL32.getPrecision(), 
DecimalUtils.MONEY_ROUNDING_SCALE);
        decimalType.addToSchema(schema);
        return schema;
    }
{code}
(we use `org.apache.avro.reflect.ReflectData`)

It all looks good and promising, but the question is how to migrate to such 
schema? 
As I said, we have a lot of such fields, and migrating all of them with 
duplication fields with future removal might be painful and would cost us a 
considerable overhead.

I made some tests and found out if two applications register the same 
`BigDecimalConversion` but for one application the `getRecommendedSchema()` is 
like the method above and for another application the `getRecommendedSchema()` 
is
{code:java}
    @Override
    public Schema getRecommendedSchema() {
        Schema schema = Schema.create(Schema.Type.STRING);
        schema.addProp(SpecificData.CLASS_PROP, BigDecimal.class.getName());
        return schema;
    }
{code}
so they can easily read each other messages using _SERVER_ schema.

So, I made two applications and wired them up with `ProtocolRepository`, 
`ReflectResponder` and all that stuff, I found out it doesn't work. Because 
`org.apache.avro.io.ResolvingDecoder` totally ignores logical types for some 
reason. 
So as a result, one application specifically told "I encode this field as a 
byte array which supposed to be a logical type 'decimal' with precision N", but 
another application just tries to convert those bytes to a string and make a 
BigDecimal based on the result string. As a result, we got
{code:java}
java.lang.NumberFormatException: Character ' is neither a decimal digit number, 
decimal point, nor "e" notation exponential mark.
{code}

In my humble opinion, `org.apache.avro.io.ResolvingDecoder` should respect 
logical types in _SERVER_ (_ACTUAL_) schema and use a corresponding conversion 
instance for reading values. In my example, I'd say it might be 
{code}
ResolvingDecoder#readString() -> read the actual logical type -> find 
BigDecimalConversion instance -> 
conversion.fromBytes(readValueWithActualSchema()) -> 
conversion.toCharSequence(readValueWithConversion)
{code}

Thank you in advance for your time, and sorry for the long post. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to