I have hacked logical types in my fork to add this capability, if you want to take a look see: https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/LogicalType.java#L78 <https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/LogicalType.java#L78>
my goal was to make decimal being a number in json. but it is a hack, it works but won’t win any beauty contests :-) and right now I don’t see how to make this clean to the point of being something that would be accepted main-stream. It would be a lot cleaner to elevate these logical types to first class types, and standardize the encoding appropriately. decimal clearly needs to be a first class type, not sure about timestamp-micros... —Z > On Jan 16, 2020, at 2:20 PM, roger peppe <rogpe...@gmail.com> wrote: > > On Thu, 16 Jan 2020, 18:59 Zoltan Farkas, <zolyfar...@yahoo.com > <mailto:zolyfar...@yahoo.com>> wrote: > answers inline > >> On Jan 16, 2020, at 5:51 AM, roger peppe <rogpe...@gmail.com >> <mailto:rogpe...@gmail.com>> wrote: >> >> On Wed, 15 Jan 2020 at 18:51, Zoltan Farkas <zolyfar...@yahoo.com >> <mailto:zolyfar...@yahoo.com>> wrote: >> What I mean with timestamp-micros, is that it is currently restricted to >> being bound to long, >> I see no reason why it should not be allowed to be bound to string as well. >> (the change should be simple to implement) >> >> Wouldn't have the implication of changing the binary representation too, >> which is not necessarily desirable (it's bulkier, slower to decode and has >> more potential error cases) ? > > yes, it would, but this is how logical types work, and I see no good way to > change this. (this is what i meant by paying the readability cost in place > where it is irrelevant) > > So you think that the JSON representation should always match the underlying > type and ignore the logical type? I can understand the reasoning behind that, > but it doesn't feel very user friendly in some cases (thinking of decimal and > duration in particular). > > Given their privileged place in the specification, I was thinking that some > logical types could gain privilege here. > > Aside: I'm a bit concerned about the potential for data corruption from > interchange between timestamp-micros and timestamp-millis, which, as far as > understand the spec, look like they'll be treated as compatible with each > other. > > >> >> >> regarding the media type, something like: application/avro.2+json would be >> fine. >> >> Attaching the ".2" to "avro" rather than "json" seems to be implying a new >> Avro version, rather than a new JSON-encoding version? Or is the idea that >> the version number here is implying both the JSON-encoding version and the >> underlying Avro version? The MIME standard seems to be silent on this >> AFAICS. >> > > the reason why I would use +json at the end is because it would be a subtype > sufix: https://en.wikipedia.org/wiki/Media_type#Suffix > <https://en.wikipedia.org/wiki/Media_type#Suffix> and most browsers will > recognize it as json, and potentially format it... > > Ah, nice, I wasn't aware of RFC 6838. > >> >> Other then that the proposal looks good. can you start a PR with the spec >> update? >> >> I can do, but I don't hold out much hope of it getting merged. I started a >> PR with a much more minor change <https://github.com/apache/avro/pull/738> >> almost 2 months ago and haven't seen any response yet. > > Send out a email on the dev mailing list, the committers seem more responsive > lately... > > I'll give it a go :) > > cheers, > rog. > >> >> cheers, >> rog. >> >> —Z >> >>> On Jan 15, 2020, at 12:30 PM, roger peppe <rogpe...@gmail.com >>> <mailto:rogpe...@gmail.com>> wrote: >>> >>> On Wed, 15 Jan 2020 at 16:27, Zoltan Farkas <zolyfar...@yahoo.com >>> <mailto:zolyfar...@yahoo.com>> wrote: >>> See comments in-line below: >>> >>>> On Jan 15, 2020, at 3:42 AM, roger peppe <rogpe...@gmail.com >>>> <mailto:rogpe...@gmail.com>> wrote: >>>> >>>> Oops, I left arrays out! Two other thoughts: >>>> >>>> I wonder if it might be worth hedging bets about logical types. It would >>>> be nice if (for example) a `timestamp-micros` value could be encoded as an >>>> RFC3339 string, so perhaps that should be allowed for, but maybe that's a >>>> step too far. >>> I think logical types should should stay above the encoding/decoding… >>> With timestamp-micros we could extend it to make it applicable to string >>> and implement the converters, and then in json you would have something >>> readable, but you would then have the same in binary and pay the >>> readability cost there as well. >>> >>> I'm not sure what you mean there. I wouldn't expect the Avro binary format >>> to be readable at all. >>> >>> I implemented special handling for decimal logical type in my >>> encoder/decoder, but the best implementation I could do still feels like a >>> hack... >>> >>>> I wonder if there should be some indication of version so that you know >>>> which JSON encoding version you're reading. Perhaps the Avro schema could >>>> include a version field (maybe as part of a definition) so you know which >>>> version of the spec to use when encoding/decoding. Then bet-hedging >>>> wouldn't be quite as important. >>> I think Schema needs to stay decoupled from the encoding. The same schema >>> can be encoded in various ways (I have a csv encoder/decoder for example, >>> https://demo.spf4j.org/example/records?_Accept=text/csv >>> <https://demo.spf4j.org/example/records?_Accept=text/csv> ). >>> I think the right abstraction for what you are looking for is the Media >>> Type(https://en.wikipedia.org/wiki/Media_type >>> <https://en.wikipedia.org/wiki/Media_type> ), >>> It would be helpful to “standardize” the media types for the avro encodings: >>> >>> Yes, on reflection, I agree, even though not every possible medium has a >>> media type. For example, what if we're storing JSON data in a file? I guess >>> it would be up to us to store the type along with the data, as the registry >>> message wire format >>> <https://docs.confluent.io/current/schema-registry/serializer-formatter.html#wire-format> >>> does, for example by wrapping the entire value in another JSON object. >>> >>> Here is what I mean, (with some examples where the same schema is served >>> with different encodings): >>> >>> 1) Binary: “application/avro” >>> https://demo.spf4j.org/example/records?_Accept=application/avro >>> <https://demo.spf4j.org/example/records?_Accept=application/avro> >>> 2) Current Json: “application/avro+json" >>> https://demo.spf4j.org/example/records?_Accept=application/avro-x%2Bjson >>> <https://demo.spf4j.org/example/records?_Accept=application/avro+json> >>> 3) New Json: “application/avro-x+json” ? >>> https://demo.spf4j.org/example/records?_Accept=application/avro-x%2Bjson >>> <https://demo.spf4j.org/example/records?_Accept=application/avro+json> >>> >>> ISTM that "x" isn't a hugely descriptive qualifier there. How about >>> "application/avro+json.v2" ? Then it's clear what to do if we want to make >>> another version. >>> >>> >>> The media type including the avro schema (like you can see in the response >>> ContentType in the headers above) can provide complete type information to >>> be able to read a avro object from a byte stream. >>> >>> application/avro-x+json;avsc="{\"type\":\"array\",\"items\":{\"$ref\":\"org.spf4j.demo:jaxrs-spf4j-demo-schema:0.8:b\"}}” >>> >>> In HTTP context this fits well with content negotiation, and a client can >>> ask for a previous version like: >>> >>> https://demo.spf4j.org/example/records/1?_Accept=application/json;avsc=%22{\%22$ref\%22:\%22org.spf4j.demo:jaxrs-spf4j-demo-schema:0.4:b\%22}%22 >>> >>> <https://demo.spf4j.org/example/records/1?_Accept=application/json;avsc=%22%7B%5C%22$ref%5C%22:%5C%22org.spf4j.demo:jaxrs-spf4j-demo-schema:0.4:b%5C%22%7D%22> >>> >>> >>> Note on $ref, it is an extension to avsc I use to reference schemas from >>> maven repos. (see >>> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences >>> <https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences> if >>> interested in more detail) >>> >>> Interesting stuff. I like the idea of being able to get the server to check >>> the desired client encoding, although I'm somewhat wary of the potential >>> security implications of $ref with arbitrary URLs. >>> >>> Apart from the issues you raised, does my description of the proposed >>> semantics seem reasonable? It could be slightly cleverer and avoid >>> type-name wrapping in more situations, but this seemed like a nice balance >>> between easy-to-explain and idiomatic-in-most-situations. >>> >>> cheers, >>> rog. >>> >> >