I'm having a problem with nesting schemas. A very brief overview of why we're using Avro (successfully so far) is:

o code generation not required
o small binary format
o dynamic use of schemas at runtime

We're doing a flavour of RPC, and the reason we're not using Avro's IDL and flavour of RPC is because the endpoint is not necessarily a Java platform (C# and Java for our purposes), and only the Java implementation of Avro has RPC. Hence no Avro RPC for us.

I'm aware that Avro doesn't import nested schemas out of the box. We need that functionality as we're exposed to schemas over which we have no control, and in the interests of maintainability, these schemas are nicely partitioned and are referenced as types from within other schemas. So, for example, a address schema refers to a some.domain.location object by having a field of type "some.domain.location". Note that our runtime has no knowledge of any some.domain package (e.g. address or location objects). Only the endpoints know about some.domain. (A layer at our endpoint runtime serialises any unknown i.e. non-primitive objects as bytestreams.)

I implemented a schema cache which intelligently imports schemas on the fly, so adding the address schema to the cache, automatically adds the location schema that it refers to. The cache uses Avro's schema to parse an added schema, catches parse exceptions, looks at the exception message to see whether or not the error is due to a missing or undefined type, and thus goes off to import the needed schema. Brittle, I know, but no other way for us. We need this functionality, and nothing else comes close to Avro.

So far so good, until today when I hit a corner case.

Say I have an address object that has two fields, called position1 and position2. If position1 and position2 are non-primitive types, then the address schema doesn't parse so presumably is an invalid Avro schema. The error concerns redefining the location type. Here's the example:

location schema
==============

{
    "name": "location",
    "type": "record",
    "namespace" : "some.domain",
    "fields" :
    [
        {
            "name": "latitude",
            "type": "float"
        },
        {
            "name": "longitude",
            "type": "float"
        }
    ]
}

address schema
==============

{
    "name": "address",
    "type": "record",
    "namespace" : "some.domain",
    "fields" :
    [
        {
            "name": "street",
            "type": "string"
        },
        {
            "name": "city",
            "type": "string"
        },
        {
            "name": "position1",
            "type": "some.domain.location"
        },
        {
            "name": "position2",
            "type": "some.domain.location"
        }
    ]
}


Now, an answer of having a list of positions as a field is not an answer for us, as we need to solve the general issue of a schema with more than one instance of the same nested type i.e. my problem is not with an address or location schema.

Can this be done? This is potentially a blocker for us.

cheers,
Peter

Reply via email to