Hi Jeroen, I'm delighted that you like my project.
Avro's restrictions ensure that enum symbols are valid identifiers in most languages. The "alternate names" in my extensions accommodate different representations for different purposes. I've meanwhile added another related extension "ordinals" that is currently only understood by the Proto-to-Avro converter and the Avro-to-Python converter and which allows associating ordinals with symbols that are not zero-based and in symbol order. I use a few other extensions as well that are not yet in my "superset" spec like annotations to preserve information about whether a data item came from an XML element or attribute, or to declare SI units in addition to a type for a field. Avro Schema turns out to be a great foundation for capturing type information in general. The serialization functionality is a cherry on top if a solid metadata story is the focus. Clemens ________________________________ Von: Jeroen van der Wal <[email protected]> Gesendet: Dienstag, September 24, 2024 8:57 PM An: [email protected] <[email protected]> Betreff: Avrotize, extending or changing Avro spec Sie erhalten nicht häufig E-Mails von [email protected]. Erfahren Sie, warum dies wichtig ist<https://aka.ms/LearnAboutSenderIdentification> I found a project [1] that converts various schema formats to and from Avro. According to the README, Avrotize is a "Rosetta Stone" for data structure definitions, allowing you to convert between numerous data and database schema formats and generate code for different programming languages. In our organization, we receive and send messages in formats like EDI, XML, JSON, and CSV. Having their payloads in an Avro schema in our distribution and processing layer would greatly simplify our architecture. After converting some of our schemas using the tool, I observed some behavior regarding enums: * Enums that contain numeric values are prefixed with an underscore (e.g., "400" becomes "_400"). This adheres to the Avro specification, but I can't find any explanation as to why an enum symbol cannot start with a numeric character. * Enum descriptions are dropped, as Avro only holds the enum symbol in the schema. The author of Avrotize has extended the Avro spec to address this limitation [2]. I'm curious about how the community views the evolution of the Avro spec and its role in new use cases. Cheers, Jeroen [1] https://github.com/clemensv/avrotize [2] https://github.com/clemensv/avrotize/blob/master/specs/avrotize-schema.md
