I'm having a problem with nesting schemas. A very brief overview of why
we're using Avro (successfully so far) is:
o code generation not required
o small binary format
o dynamic use of schemas at runtime
We're doing a flavour of RPC, and the reason we're not using Avro's IDL
and flavour of RPC is because the endpoint is not necessarily a Java
platform (C# and Java for our purposes), and only the Java
implementation of Avro has RPC. Hence no Avro RPC for us.
I'm aware that Avro doesn't import nested schemas out of the box. We
need that functionality as we're exposed to schemas over which we have
no control, and in the interests of maintainability, these schemas are
nicely partitioned and are referenced as types from within other
schemas. So, for example, a address schema refers to a
some.domain.location object by having a field of type
"some.domain.location". Note that our runtime has no knowledge of any
some.domain package (e.g. address or location objects). Only the
endpoints know about some.domain. (A layer at our endpoint runtime
serialises any unknown i.e. non-primitive objects as bytestreams.)
I implemented a schema cache which intelligently imports schemas on the
fly, so adding the address schema to the cache, automatically adds the
location schema that it refers to. The cache uses Avro's schema to parse
an added schema, catches parse exceptions, looks at the exception
message to see whether or not the error is due to a missing or undefined
type, and thus goes off to import the needed schema. Brittle, I know,
but no other way for us. We need this functionality, and nothing else
comes close to Avro.
So far so good, until today when I hit a corner case.
Say I have an address object that has two fields, called position1 and
position2. If position1 and position2 are non-primitive types, then the
address schema doesn't parse so presumably is an invalid Avro schema.
The error concerns redefining the location type. Here's the example:
location schema
==============
{
"name": "location",
"type": "record",
"namespace" : "some.domain",
"fields" :
[
{
"name": "latitude",
"type": "float"
},
{
"name": "longitude",
"type": "float"
}
]
}
address schema
==============
{
"name": "address",
"type": "record",
"namespace" : "some.domain",
"fields" :
[
{
"name": "street",
"type": "string"
},
{
"name": "city",
"type": "string"
},
{
"name": "position1",
"type": "some.domain.location"
},
{
"name": "position2",
"type": "some.domain.location"
}
]
}
Now, an answer of having a list of positions as a field is not an answer
for us, as we need to solve the general issue of a schema with more than
one instance of the same nested type i.e. my problem is not with an
address or location schema.
Can this be done? This is potentially a blocker for us.
cheers,
Peter