On Wed, 15 Jan 2020 at 18:51, Zoltan Farkas <zolyfar...@yahoo.com> wrote:
> What I mean with timestamp-micros, is that it is currently restricted to > being bound to long, > I see no reason why it should not be allowed to be bound to string as > well. (the change should be simple to implement) > Wouldn't have the implication of changing the binary representation too, which is not necessarily desirable (it's bulkier, slower to decode and has more potential error cases) ? > regarding the media type, something like: application/avro.2+json would be > fine. > Attaching the ".2" to "avro" rather than "json" seems to be implying a new Avro version, rather than a new JSON-encoding version? Or is the idea that the version number here is implying both the JSON-encoding version *and* the underlying Avro version? The MIME standard seems to be silent on this AFAICS. > Other then that the proposal looks good. can you start a PR with the spec > update? > I can do, but I don't hold out much hope of it getting merged. I started a PR with a much more minor change <https://github.com/apache/avro/pull/738> almost 2 months ago and haven't seen any response yet. cheers, rog. > > —Z > > On Jan 15, 2020, at 12:30 PM, roger peppe <rogpe...@gmail.com> wrote: > > On Wed, 15 Jan 2020 at 16:27, Zoltan Farkas <zolyfar...@yahoo.com> wrote: > >> See comments in-line below: >> >> On Jan 15, 2020, at 3:42 AM, roger peppe <rogpe...@gmail.com> wrote: >> >> Oops, I left arrays out! Two other thoughts: >> >> >> - I wonder if it might be worth hedging bets about logical types. It >> would be nice if (for example) a `timestamp-micros` value could be encoded >> as an RFC3339 string, so perhaps that should be allowed for, but maybe >> that's a step too far. >> >> I think logical types should should stay above the encoding/decoding… >> With timestamp-micros we could extend it to make it applicable to string >> and implement the converters, and then in json you would have something >> readable, but you would then have the same in binary and pay the >> readability cost there as well. >> > > I'm not sure what you mean there. I wouldn't expect the Avro binary format > to be readable at all. > > I implemented special handling for decimal logical type in my >> encoder/decoder, but the best implementation I could do still feels like a >> hack... >> >> >> - I wonder if there should be some indication of version so that you >> know which JSON encoding version you're reading. Perhaps the Avro schema >> could include a version field (maybe as part of a definition) so you know >> which version of the spec to use when encoding/decoding. Then bet-hedging >> wouldn't be quite as important. >> >> I think Schema needs to stay decoupled from the encoding. The same schema >> can be encoded in various ways (I have a csv encoder/decoder for example, >> https://demo.spf4j.org/example/records?_Accept=text/csv ). >> I think the right abstraction for what you are looking for is the Media >> Type(https://en.wikipedia.org/wiki/Media_type ), >> It would be helpful to “standardize” the media types for the avro >> encodings: >> > > Yes, on reflection, I agree, even though not every possible medium has a > media type. For example, what if we're storing JSON data in a file? I guess > it would be up to us to store the type along with the data, as the registry > message wire format > <https://docs.confluent.io/current/schema-registry/serializer-formatter.html#wire-format> > does, for example by wrapping the entire value in another JSON object. > > >> Here is what I mean, (with some examples where the same schema is served >> with different encodings): >> >> 1) Binary: “application/avro” >> https://demo.spf4j.org/example/records?_Accept=application/avro >> 2) Current Json: “application/avro+json" >> https://demo.spf4j.org/example/records?_Accept=application/avro-x%2Bjson >> <https://demo.spf4j.org/example/records?_Accept=application/avro+json> >> 3) New Json: “application/avro-x+json” ? >> https://demo.spf4j.org/example/records?_Accept=application/avro-x%2Bjson >> <https://demo.spf4j.org/example/records?_Accept=application/avro+json> >> > > ISTM that "x" isn't a hugely descriptive qualifier there. How about > "application/avro+json.v2" ? Then it's clear what to do if we want to make > another version. > > > >> The media type including the avro schema (like you can see in the >> response ContentType in the headers above) can provide complete type >> information to be able to read a avro object from a byte stream. >> >> >> application/avro-x+json;avsc="{\"type\":\"array\",\"items\":{\"$ref\":\"org.spf4j.demo:jaxrs-spf4j-demo-schema:0.8:b\"}}” >> >> In HTTP context this fits well with content negotiation, and a client can >> ask for a previous version like: >> >> >> https://demo.spf4j.org/example/records/1?_Accept=application/json;avsc=%22{\%22$ref\%22:\%22org.spf4j.demo:jaxrs-spf4j-demo-schema:0.4:b\%22}%22 >> <https://demo.spf4j.org/example/records/1?_Accept=application/json;avsc=%22%7B%5C%22$ref%5C%22:%5C%22org.spf4j.demo:jaxrs-spf4j-demo-schema:0.4:b%5C%22%7D%22> >> >> > >> Note on $ref, it is an extension to avsc I use to reference schemas from >> maven repos. (see >> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences if >> interested in more detail) >> > > Interesting stuff. I like the idea of being able to get the server to > check the desired client encoding, although I'm somewhat wary of the > potential security implications of $ref with arbitrary URLs. > > Apart from the issues you raised, does my description of the proposed > semantics seem reasonable? It could be slightly cleverer and avoid > type-name wrapping in more situations, but this seemed like a nice balance > between easy-to-explain and idiomatic-in-most-situations. > > cheers, > rog. > > >