Re: More idiomatic JSON encoding for unions

roger peppe Thu, 16 Jan 2020 02:52:43 -0800

On Wed, 15 Jan 2020 at 18:51, Zoltan Farkas <[email protected]> wrote:


> What I mean with timestamp-micros, is that it is currently restricted to
> being bound to long,
> I see no reason why it should not be allowed to be bound to string as
> well. (the change should be simple to implement)
>

Wouldn't have the implication of changing the binary representation too,
which is not necessarily desirable (it's bulkier, slower to decode and has
more potential error cases) ?


> regarding the media type, something like: application/avro.2+json would be
> fine.
>

Attaching the ".2" to "avro" rather than "json" seems to be implying a new
Avro version, rather than a new JSON-encoding version? Or is the idea that
the version number here is implying both the JSON-encoding version *and* the
underlying Avro version?  The MIME standard seems to be silent on this
AFAICS.


> Other then that the proposal looks good. can you start a PR with the spec
> update?
>

I can do, but I don't hold out much hope of it getting merged. I started a
PR with a much more minor change <https://github.com/apache/avro/pull/738>
almost 2 months ago and haven't seen any response yet.

  cheers,
    rog.

>
> —Z
>
> On Jan 15, 2020, at 12:30 PM, roger peppe <[email protected]> wrote:
>
> On Wed, 15 Jan 2020 at 16:27, Zoltan Farkas <[email protected]> wrote:
>
>> See comments in-line below:
>>
>> On Jan 15, 2020, at 3:42 AM, roger peppe <[email protected]> wrote:
>>
>> Oops, I left arrays out! Two other thoughts:
>>
>>
>>    - I wonder if it might be worth hedging bets about logical types. It
>>    would be nice if (for example) a `timestamp-micros` value could be encoded
>>    as an RFC3339 string, so perhaps that should be allowed for, but maybe
>>    that's a step too far.
>>
>> I think logical types should should stay above the encoding/decoding…
>> With timestamp-micros we could extend it to make it applicable to string
>> and implement the converters, and then in json you would have something
>> readable, but you would then have the same in binary and pay the
>> readability cost there as well.
>>
>
> I'm not sure what you mean there. I wouldn't expect the Avro binary format
> to be readable at all.
>
> I implemented special handling for decimal logical type in my
>> encoder/decoder, but the best implementation I could do still feels like a
>> hack...
>>
>>
>>    - I wonder if there should be some indication of version so that you
>>    know which JSON encoding version you're reading. Perhaps the Avro schema
>>    could include a version field (maybe as part of a definition) so you know
>>    which version of the spec to use when encoding/decoding. Then bet-hedging
>>    wouldn't be quite as important.
>>
>> I think Schema needs to stay decoupled from the encoding. The same schema
>> can be encoded in various ways (I have a csv encoder/decoder for example,
>> https://demo.spf4j.org/example/records?_Accept=text/csv ).
>> I think the right abstraction for what you are looking for is the Media
>> Type(https://en.wikipedia.org/wiki/Media_type ),
>> It would be helpful to “standardize” the media types for the avro
>> encodings:
>>
>
> Yes, on reflection, I agree, even though not every possible medium has a
> media type. For example, what if we're storing JSON data in a file? I guess
> it would be up to us to store the type along with the data, as the registry
> message wire format
> <https://docs.confluent.io/current/schema-registry/serializer-formatter.html#wire-format>
> does, for example by wrapping the entire value in another JSON object.
>
>
>> Here is what I mean, (with some examples where the same schema is served
>> with different encodings):
>>
>> 1) Binary: “application/avro”
>> https://demo.spf4j.org/example/records?_Accept=application/avro
>> 2) Current Json: “application/avro+json"
>> https://demo.spf4j.org/example/records?_Accept=application/avro-x%2Bjson
>> <https://demo.spf4j.org/example/records?_Accept=application/avro+json>
>> 3) New Json: “application/avro-x+json” ?
>> https://demo.spf4j.org/example/records?_Accept=application/avro-x%2Bjson
>> <https://demo.spf4j.org/example/records?_Accept=application/avro+json>
>>
>
> ISTM that "x" isn't a hugely descriptive qualifier there. How about
> "application/avro+json.v2" ? Then it's clear what to do if we want to make
> another version.
>
>
>
>> The media type including the avro schema (like you can see in the
>> response ContentType in the headers above) can provide complete type
>>  information to be able to read a avro object from a byte stream.
>>
>>
>> application/avro-x+json;avsc="{\"type\":\"array\",\"items\":{\"$ref\":\"org.spf4j.demo:jaxrs-spf4j-demo-schema:0.8:b\"}}”
>>
>> In HTTP context this fits well with content negotiation, and a client can
>> ask for a previous version like:
>>
>>
>> https://demo.spf4j.org/example/records/1?_Accept=application/json;avsc=%22{\%22$ref\%22:\%22org.spf4j.demo:jaxrs-spf4j-demo-schema:0.4:b\%22}%22
>> <https://demo.spf4j.org/example/records/1?_Accept=application/json;avsc=%22%7B%5C%22$ref%5C%22:%5C%22org.spf4j.demo:jaxrs-spf4j-demo-schema:0.4:b%5C%22%7D%22>
>>
>>
>
>> Note on $ref,  it is an extension to avsc I use to reference schemas from
>> maven repos. (see
>> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences if
>> interested in more detail)
>>
>
> Interesting stuff. I like the idea of being able to get the server to
> check the desired client encoding, although I'm somewhat wary of the
> potential security implications of $ref with arbitrary URLs.
>
> Apart from the issues you raised, does my description of the proposed
> semantics seem reasonable? It could be slightly cleverer and avoid
> type-name wrapping in more situations, but this seemed like a nice balance
> between easy-to-explain and idiomatic-in-most-situations.
>
>    cheers,
>      rog.
>
>
>

Re: More idiomatic JSON encoding for unions

Reply via email to