Hello!  You might be interested in this short discussion on the dev@
mailing list: 

In short, it appears that the record name is already ignored in
record-to-record matching (at least outside of unions) as an
implementation detail in Java.  I never *did* get around to verifying
the behaviour of the other language implementations, but if this is
what is being done in practice, it's worth clarifying in the

It does seems like a very pragmatic thing to do, and would help with
the CloudEvents Avro use case.  It would be a nice recipe to share in
the docs: the right way to read an envelope from a custom message when
you don't need the payload.

I'm not sure I understand the third strategy, however!  There aren't
any names in binary data when writing - what would the alias do?

(Also, I largely prefer your avro version with explicitly typed
metadata fields and names as well!)

All my best, Ryan

On Wed, Dec 18, 2019 at 5:49 PM roger peppe <rogpe...@gmail.com> wrote:
> Hi,
> Background: I've been contemplating the proposed Avro format in the 
> CloudEvent specification, which defines standard metadata for events. It 
> defines a very generic format for an event that allows storage of almost any 
> data. It seems to me that by going in that direction it's losing almost all 
> the advantages of using Avro in the first place. It feels like it's trying to 
> shoehorn a dynamic message format like JSON into the Avro format, where using 
> Avro itself could do so much better.
> I'm hoping to propose something better. I had what I thought was a nice idea, 
> but it doesn't quite work, and I thought I'd bring up the subject here and 
> see if anyone had some better ideas.
> The schema resolution part of the spec allows a reader to read a schema that 
> was written with extra fields. So, theoretically, we could define a 
> CloudEvent something like this:
> { "name": "CloudEvent", "type": "record", "fields": [{ "name": "Metadata", 
> "type": { "type": "record", "name": "CloudEvent", "namespace": 
> "avro.apache.org", "fields": [{ "name": "id", "type": "string" }, { "name": 
> "source", "type": "string" }, { "name": "time", "type": "long", 
> "logicalType": "timestamp-micros" }] } }] }
> Theoretically, this could enable any event that's a record that has at least 
> a Metadata field with the above fields to be read generically. The CloudEvent 
> type above could be seen as a structural supertype of all possible 
> more-specific CloudEvent-compatible records that have such a compatible field.
> This has a few nice advantages:
> - there's no need for any wrapping of payload data.
> - the CloudEvent type can evolve over time like any other Avro type.
> - all the data message fields are immediately available alongside the 
> metadata.
> - there's still exactly one schema for a topic, encapsulating both the 
> metadata and the payload.
> However, this idea fails because of one problem - this schema resolution 
> rule: "both schemas are records with the same (unqualified) name". This means 
> that unless everyone names all their CloudEvent-compatible records 
> "CloudEvent", they can't be read like this.
> I don't think people will be willing to name all their records "CloudEvent", 
> so we have a problem.
> I can see a few possible workarounds:
> when reading the record as a CloudEvent, read it with a schema that's the 
> same as CloudEvent, but with the top level record name changed to the top 
> level name of the schema that was used to write the record.
> ignore record names when matching schema record types.
> allow aliases to be specified when writing data as well as reading it. When 
> defining a CloudEvent-compatible event, you'd add a CloudEvent alias to your 
> record.
> None of the options are particularly nice. 1 is probably the easiest to do, 
> although means you'd still need some custom logic when decoding records, 
> meaning you couldn't use stock decoders.
> I like the idea of 2, although it gets a bit tricky when dealing with union 
> types. You could define the matching such that it ignores names only when the 
> two matched types are unambiguous (i.e. only one record in both). This could 
> be implemented as an option ("use structural typing") when decoding.
> 3 is probably cleanest but interacts significantly with the spec (for 
> example, the canonical schema transformation strips aliases out, but they'd 
> need to be retained).
> Any thoughts? Is this a silly thing to be contemplating? Is there a better 
> way?
>   cheers,
>     rog.

Reply via email to