Re: More idiomatic JSON encoding for unions

Zoltan Farkas Tue, 14 Jan 2020 07:01:57 -0800

I can go ahead create a PR to add the Encoder/Decoder implementations.
let me know if anyone else plans to do that. (to avoid wasting time)


thanks

—Z

> On Jan 9, 2020, at 3:51 AM, Driesprong, Fokko <fo...@driesprong.frl> wrote:
> 
> Thanks for chipping in Zoltan and Sean. I did not plan to change the current 
> JSON encoder. My initial suggestion would make this an option that the user 
> can set. The default will be the current situation, so nothing should change 
> when upgrading to a newer version of Avro.
> 
> Cheers, Fokko
> 
> Op wo 8 jan. 2020 om 21:39 schreef Sean Busbey <bus...@apache.org 
> <mailto:bus...@apache.org>>:
> I agree with Zoltan here. We have a really long history of maintaining 
> compatibility for encoders.
> 
> On Tue, Jan 7, 2020 at 10:06 AM Zoltan Farkas <zolyfar...@yahoo.com 
> <mailto:zolyfar...@yahoo.com>> wrote:
> Fokko, 
> 
> I am not sure we should be changing the existing json encoder,
> I think we should just add another encoder, and devs can use either one of 
> them based on their use case… and stay backward compatible.
> 
> we should maybe standardize the content types for them… I have seen 
> application/avro being used for binary, we could have for json:
> application/avro+json for the current format, application/avro.2+json for the 
> new format…. 
> 
> At some point in the future we could deprecate the old one…
> 
> —Z
> 
> 
>> On Jan 7, 2020, at 2:41 AM, Driesprong, Fokko <fo...@driesprong.frl 
>> <mailto:fo...@driesprong.frl>> wrote:
>> 
>> I would be a great fan of this as well. This also bothered me. The tricky 
>> part here is to see when to release this because it will break the existing 
>> JSON structure. We could make this configurable as well.
>> 
>> Cheers, Fokko
>> 
>> Op ma 6 jan. 2020 om 22:36 schreef roger peppe <rogpe...@gmail.com 
>> <mailto:rogpe...@gmail.com>>:
>> That's great, thanks! I thought this would probably have come up before.
>> 
>> Have you written down your changes in a somewhat more formal specification 
>> document, by any chance?
>> 
>>   cheers,
>>     rog.
>> 
>> 
>> On Mon, 6 Jan 2020, 18:50 zoly farkas, <zolyfar...@yahoo.com 
>> <mailto:zolyfar...@yahoo.com>> wrote:
>> I think there is consensus that this should be implemented, see [AVRO-1582] 
>> Json serialization of nullable fileds and fields with default values 
>> improvement. - ASF JIRA <https://issues.apache.org/jira/browse/AVRO-1582>
>> 
>> [AVRO-1582] Json serialization of nullable fileds and fields with defaul...
>>  <https://issues.apache.org/jira/browse/AVRO-1582>
>> 
>> 
>> Here is a live example to get some sample data in avro json: 
>> https://demo.spf4j.org/example/records/1?_Accept=application/avro%2Bjson 
>> <https://demo.spf4j.org/example/records/1?_Accept=application/avro%2Bjson>
>> and the "Natural" 
>> https://demo.spf4j.org/example/records/1?_Accept=application/json 
>> <https://demo.spf4j.org/example/records/1?_Accept=application/json> using 
>> the encoder suggested as implementation in the jira.
>> 
>> Somebody needs to find the time do the work to integrate this...
>> 
>> --Z
>> 
>> 
>> 
>> 
>> On Monday, January 6, 2020, 12:36:44 PM EST, roger peppe <rogpe...@gmail.com 
>> <mailto:rogpe...@gmail.com>> wrote:
>> 
>> 
>> Hi,
>> 
>> The JSON encoding in the specification 
>> <https://avro.apache.org/docs/current/spec.html#json_encoding> includes an 
>> explicit type name for all kinds of object other than null. This means that 
>> a JSON-encoded Avro value with a union is very rarely directly compatible 
>> with normal JSON formats.
>> 
>> For example, it's very common for a JSON-encoded value to allow a value 
>> that's either null or string. In Avro, that's trivially expressed as the 
>> union type ["null", "string"]. With conventional JSON, a string value "foo" 
>> would be encoded just as "foo", which is easily distinguished from null when 
>> decoding. However when using the Avro JSON format it must be encoded as 
>> {"string": "foo"}.
>> 
>> This means that Avro JSON-encoded values don't interchange easily with other 
>> JSON-encoded values.
>> 
>> AFAICS the main reason that the type name is always required in JSON-encoded 
>> unions is to avoid ambiguity. This particularly applies to record and map 
>> types, where it's not possible in general to tell which member of the union 
>> has been specified by looking at the data itself.
>> 
>> However, that reasoning doesn't apply if all the members of the union can be 
>> distinguished from their JSON token type.
>> 
>> I am considering using a JSON encoding that omits the type name when all the 
>> members of the union encode to distinct JSON token types (the JSON token 
>> types being: null, boolean, string, number, object and array).
>> 
>> For example, JSON-encoded values using the Avro schema ["null", "string", 
>> "int"] would encode as the literal values themselves (e.g. null, "foo", 
>> 999), but JSON-encoded values using the Avro schema ["int", "double"] would 
>> require the type name because the JSON lexeme doesn't distinguish between 
>> different kinds of number.
>> 
>> This would mean that it would be possible to represent a significant subset 
>> of "normal" JSON schemas with Avro. It seems to me that would potentially be 
>> very useful.
>> 
>> Thoughts? Is this a really bad idea to be contemplating? :)
>> 
>>   cheers,
>>     rog.
>> 
>> 
>

Re: More idiomatic JSON encoding for unions

Reply via email to