Hi Jeroen,

I'm delighted that you like my project.

Avro's restrictions ensure that enum symbols are valid identifiers in most 
languages. The "alternate names" in my extensions accommodate different 
representations for different purposes. I've meanwhile added another related 
extension "ordinals" that is currently only understood by the Proto-to-Avro 
converter and the Avro-to-Python converter and which allows associating 
ordinals with symbols that are not zero-based and in symbol order.

I use a few other extensions as well that are not yet in my "superset" spec 
like annotations to preserve information about whether a data item came from an 
XML element or attribute, or to declare SI units in addition to a type for a 
field.

Avro Schema turns out to be a great foundation for capturing type information 
in general. The serialization functionality is a cherry on top if a solid 
metadata story is the focus.

Clemens


________________________________
Von: Jeroen van der Wal <[email protected]>
Gesendet: Dienstag, September 24, 2024 8:57 PM
An: [email protected] <[email protected]>
Betreff: Avrotize, extending or changing Avro spec

Sie erhalten nicht häufig E-Mails von [email protected]. Erfahren Sie, 
warum dies wichtig ist<https://aka.ms/LearnAboutSenderIdentification>

I found a project [1] that converts various schema formats to and from Avro. 
According to the README, Avrotize is a "Rosetta Stone" for data structure 
definitions, allowing you to convert between numerous data and database schema 
formats and generate code for different programming languages.

In our organization, we receive and send messages in formats like EDI, XML, 
JSON, and CSV. Having their payloads in an Avro schema in our distribution and 
processing layer would greatly simplify our architecture.

After converting some of our schemas using the tool, I observed some behavior 
regarding enums:

  *   Enums that contain numeric values are prefixed with an underscore (e.g., 
"400" becomes "_400"). This adheres to the Avro specification, but I can't find 
any explanation as to why an enum symbol cannot start with a numeric character.
  *   Enum descriptions are dropped, as Avro only holds the enum symbol in the 
schema. The author of Avrotize has extended the Avro spec to address this 
limitation [2].

I'm curious about how the community views the evolution of the Avro spec and 
its role in new use cases.

Cheers,
Jeroen

[1] https://github.com/clemensv/avrotize
[2] https://github.com/clemensv/avrotize/blob/master/specs/avrotize-schema.md

Reply via email to