Re: More idiomatic JSON encoding for unions

Zoltan Farkas Thu, 16 Jan 2020 13:02:40 -0800

I have hacked logical types in my fork to add this capability, if you want to 
take a look see:
https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/LogicalType.java#L78
 
<https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/LogicalType.java#L78>


my goal was to make decimal being a number in json.
but it is a hack, it works but won’t win any beauty contests :-) and right now 
I don’t see how to make this clean to the point of being something that would 
be accepted main-stream.

It would be a lot cleaner to elevate these logical types to first class types, 
and standardize the encoding appropriately.
decimal clearly needs to be a first class type, not sure about 
timestamp-micros...

—Z


> On Jan 16, 2020, at 2:20 PM, roger peppe <rogpe...@gmail.com> wrote:
> 
> On Thu, 16 Jan 2020, 18:59 Zoltan Farkas, <zolyfar...@yahoo.com 
> <mailto:zolyfar...@yahoo.com>> wrote:
> answers inline
> 
>> On Jan 16, 2020, at 5:51 AM, roger peppe <rogpe...@gmail.com 
>> <mailto:rogpe...@gmail.com>> wrote:
>> 
>> On Wed, 15 Jan 2020 at 18:51, Zoltan Farkas <zolyfar...@yahoo.com 
>> <mailto:zolyfar...@yahoo.com>> wrote:
>> What I mean with timestamp-micros, is that it is currently restricted to 
>> being bound to long,
>> I see no reason why it should not be allowed to be bound to string as well. 
>> (the change should be simple to implement)
>> 
>> Wouldn't have the implication of changing the binary representation too, 
>> which is not necessarily desirable (it's bulkier, slower to decode and has 
>> more potential error cases) ?
> 
> yes, it would, but this is how logical types work, and I see no good way to 
> change this.  (this is what i meant by paying the readability cost in place 
> where it is irrelevant)
> 
> So you think that the JSON representation should always match the underlying 
> type and ignore the logical type? I can understand the reasoning behind that, 
> but it doesn't feel very user friendly in some cases (thinking of decimal and 
> duration in particular).
> 
> Given their privileged place in the specification, I was thinking that some 
> logical types could gain privilege here.
> 
> Aside: I'm a bit concerned about the potential for data corruption from 
> interchange between timestamp-micros and timestamp-millis, which, as far as 
> understand the spec, look like they'll be treated as compatible with each 
> other.
> 
> 
>> 
>> 
>> regarding the media type, something like: application/avro.2+json would be 
>> fine.
>> 
>> Attaching the ".2" to "avro" rather than "json" seems to be implying a new 
>> Avro version, rather than a new JSON-encoding version? Or is the idea that 
>> the version number here is implying both the JSON-encoding version and the 
>> underlying Avro version?  The MIME standard seems to be silent on this 
>> AFAICS.
>> 
> 
> the reason why I would use +json at the end is because it would be a subtype 
> sufix: https://en.wikipedia.org/wiki/Media_type#Suffix 
> <https://en.wikipedia.org/wiki/Media_type#Suffix> and most browsers will 
> recognize it as json, and potentially format it...
> 
> Ah, nice, I wasn't aware of RFC 6838.
> 
>> 
>> Other then that the proposal looks good. can you start a PR with the spec 
>> update?
>> 
>> I can do, but I don't hold out much hope of it getting merged. I started a 
>> PR with a much more minor change <https://github.com/apache/avro/pull/738> 
>> almost 2 months ago and haven't seen any response yet.
> 
> Send out a email on the dev mailing list, the committers seem more responsive 
> lately...
> 
> I'll give it a go :)
> 
>   cheers,
>     rog.
> 
>> 
>>   cheers,
>>     rog.
>> 
>> —Z
>> 
>>> On Jan 15, 2020, at 12:30 PM, roger peppe <rogpe...@gmail.com 
>>> <mailto:rogpe...@gmail.com>> wrote:
>>> 
>>> On Wed, 15 Jan 2020 at 16:27, Zoltan Farkas <zolyfar...@yahoo.com 
>>> <mailto:zolyfar...@yahoo.com>> wrote:
>>> See comments in-line below:
>>> 
>>>> On Jan 15, 2020, at 3:42 AM, roger peppe <rogpe...@gmail.com 
>>>> <mailto:rogpe...@gmail.com>> wrote:
>>>> 
>>>> Oops, I left arrays out! Two other thoughts: 
>>>> 
>>>> I wonder if it might be worth hedging bets about logical types. It would 
>>>> be nice if (for example) a `timestamp-micros` value could be encoded as an 
>>>> RFC3339 string, so perhaps that should be allowed for, but maybe that's a 
>>>> step too far.
>>> I think logical types should should stay above the encoding/decoding…  
>>> With timestamp-micros we could extend it to make it applicable to string 
>>> and implement the converters, and then in json you would have something 
>>> readable, but you would then have the same in binary and pay the 
>>> readability cost there as well.
>>> 
>>> I'm not sure what you mean there. I wouldn't expect the Avro binary format 
>>> to be readable at all.
>>> 
>>> I implemented special handling for decimal logical type in my 
>>> encoder/decoder, but the best implementation I could do still feels like a 
>>> hack...
>>> 
>>>> I wonder if there should be some indication of version so that you know 
>>>> which JSON encoding version you're reading. Perhaps the Avro schema could 
>>>> include a version field (maybe as part of a definition) so you know which 
>>>> version of the spec to use when encoding/decoding. Then bet-hedging 
>>>> wouldn't be quite as important.
>>> I think Schema needs to stay decoupled from the encoding. The same schema 
>>> can be encoded in various ways (I have a csv encoder/decoder for example, 
>>> https://demo.spf4j.org/example/records?_Accept=text/csv 
>>> <https://demo.spf4j.org/example/records?_Accept=text/csv> ).
>>> I think the right abstraction for what you are looking for is the Media 
>>> Type(https://en.wikipedia.org/wiki/Media_type 
>>> <https://en.wikipedia.org/wiki/Media_type> ), 
>>> It would be helpful to “standardize” the media types for the avro encodings:
>>> 
>>> Yes, on reflection, I agree, even though not every possible medium has a 
>>> media type. For example, what if we're storing JSON data in a file? I guess 
>>> it would be up to us to store the type along with the data, as the registry 
>>> message wire format 
>>> <https://docs.confluent.io/current/schema-registry/serializer-formatter.html#wire-format>
>>>  does, for example by wrapping the entire value in another JSON object.
>>>  
>>> Here is what I mean, (with some examples where the same schema is served 
>>> with different encodings):
>>> 
>>> 1) Binary: “application/avro” 
>>> https://demo.spf4j.org/example/records?_Accept=application/avro 
>>> <https://demo.spf4j.org/example/records?_Accept=application/avro>
>>> 2) Current Json: “application/avro+json" 
>>> https://demo.spf4j.org/example/records?_Accept=application/avro-x%2Bjson 
>>> <https://demo.spf4j.org/example/records?_Accept=application/avro+json>
>>> 3) New Json: “application/avro-x+json” ?  
>>> https://demo.spf4j.org/example/records?_Accept=application/avro-x%2Bjson 
>>> <https://demo.spf4j.org/example/records?_Accept=application/avro+json>
>>> 
>>> ISTM that "x" isn't a hugely descriptive qualifier there. How about 
>>> "application/avro+json.v2" ? Then it's clear what to do if we want to make 
>>> another version.
>>> 
>>>  
>>> The media type including the avro schema (like you can see in the response 
>>> ContentType in the headers above) can provide complete type  information to 
>>> be able to read a avro object from a byte stream.
>>> 
>>> application/avro-x+json;avsc="{\"type\":\"array\",\"items\":{\"$ref\":\"org.spf4j.demo:jaxrs-spf4j-demo-schema:0.8:b\"}}”
>>> 
>>> In HTTP context this fits well with content negotiation, and a client can 
>>> ask for a previous version like:
>>> 
>>> https://demo.spf4j.org/example/records/1?_Accept=application/json;avsc=%22{\%22$ref\%22:\%22org.spf4j.demo:jaxrs-spf4j-demo-schema:0.4:b\%22}%22
>>>  
>>> <https://demo.spf4j.org/example/records/1?_Accept=application/json;avsc=%22%7B%5C%22$ref%5C%22:%5C%22org.spf4j.demo:jaxrs-spf4j-demo-schema:0.4:b%5C%22%7D%22>
>>>  
>>> 
>>> Note on $ref,  it is an extension to avsc I use to reference schemas from 
>>> maven repos. (see 
>>> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences 
>>> <https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences> if 
>>> interested in more detail)
>>> 
>>> Interesting stuff. I like the idea of being able to get the server to check 
>>> the desired client encoding, although I'm somewhat wary of the potential 
>>> security implications of $ref with arbitrary URLs.
>>> 
>>> Apart from the issues you raised, does my description of the proposed 
>>> semantics seem reasonable? It could be slightly cleverer and avoid 
>>> type-name wrapping in more situations, but this seemed like a nice balance 
>>> between easy-to-explain and idiomatic-in-most-situations.
>>> 
>>>    cheers,
>>>      rog.
>>> 
>> 
>

Re: More idiomatic JSON encoding for unions

Reply via email to