Oops, I left arrays out! Two other thoughts:

   - I wonder if it might be worth hedging bets about logical types. It
   would be nice if (for example) a `timestamp-micros` value could be encoded
   as an RFC3339 string, so perhaps that should be allowed for, but maybe
   that's a step too far.
   - I wonder if there should be some indication of version so that you
   know which JSON encoding version you're reading. Perhaps the Avro schema
   could include a version field (maybe as part of a definition) so you know
   which version of the spec to use when encoding/decoding. Then bet-hedging
   wouldn't be quite as important.



*JSON Encoding *
>
> Except for unions, the JSON encoding is the same as is used to encode
field default values.

> The value of a union is encoded in JSON as follows:

>
   - if all values of the union can be distinguished *unambiguously* (see
   below), the JSON encoding is the same as is used to encode field default
   values for the type
   - otherwise it is encoded as a JSON object with one name/value pair
   whose name is the type's name and whose value is the recursively encoded
   value. For Avro's named types (record, fixed or enum) the user-specified
   name is used, for other types the type name is used.

Unambiguity is defined as follows:

>
> An Avro value can be encoded as one of a set of JSON types:

>
   - null encodes as {null}
   - boolean encodes as {boolean}
   - int encodes as {number}
   - long encodes as {number}
   - float encodes as {number, string}
   - double encodes as {number, string}
   - bytes encodes as {string}
   - string encodes as {string}
   - any enum type encodes as {string}
   - any array type encodes as {array}
   - any map type encodes as {object}
   - any record type encodes as {object}

A union is considered *unambiguous* if the JSON type sets for all the
members of the union form mutually disjoint sets.

Note that float and double are considered ambiguous with respect to string
because in the future, Avro might support encoding NaN and infinity values
as strings.

On Tue, 14 Jan 2020 at 21:57, roger peppe <rogpe...@gmail.com> wrote:

> On Tue, 14 Jan 2020 at 19:26, Zoltan Farkas <zolyfar...@yahoo.com> wrote:
>
>> Makes sense,
>>
>> We have to agree on he scope of this implementation.
>>
>> Right now the implementation I have in java, handles only the:
>>
>> union {null, [some type]} situation.
>>
>> Are we ok with this for a start?
>>
>
> I'm not sure that it's worth publishing a half-way solution, as if people
> start using it and a fuller solution is implemented, there will be three
> incompatible standards, which isn't ideal.
>
>>
>> What I see more, is to handle:
>>
>> 1) union {string, double}, (although we have to specify behavior for NAN,
>> Positive and negative infinity);  union {string, boolean}; ….
>>
>
> My thought, as mentioned at the beginning of this thread, is to omit the
> wrapping when all the members of the union encode to distinct JSON token
> types (the JSON token types being: null, boolean, string, number, object
> and array).
>
> I think that we could probably leave out explicit mention of NaN and
> infinity, as that's an issue with schemas too, and there's no obviously
> good solution. That said, if we *did* want to solve the issue of NaN and
> infinity in the future, things might get awkward with respect to this
> thread's proposal, because it's likely that the only reasonable way to
> solve that issue is to encode NaN and infinity as "NaN" and "±Infinity",
> which means that the union ["string", "float"] becomes ambiguous if we
> leave out the type name for that case.
>
> It seems that it's not unheard-of to a string representation for these
> float values (see https://issues.apache.org/jira/browse/AVRO-1290).
>
> So perhaps we could define the format something like this:
>
>
> *JSON Encoding *
>>
>> Except for unions, the JSON encoding is the same as is used to encode
> field default values.
>
>> The value of a union is encoded in JSON as follows:
>
>>
>    - if all values of the union can be distinguished *unambiguously* (see
>    below), the JSON encoding is the same as is used to encode field default
>    values for the type
>    - otherwise it is encoded as a JSON object with one name/value pair
>    whose name is the type's name and whose value is the recursively encoded
>    value. For Avro's named types (record, fixed or enum) the user-specified
>    name is used, for other types the type name is used.
>
> Unambiguity is defined as follows:
>
>>
>> An Avro value can be encoded as one of a set of JSON types:
>
>>
>    - null encodes as {null}
>    - boolean encodes as {boolean}
>    - int encodes as {number}
>    - long encodes as {number}
>    - float encodes as {number, string}
>    - double encodes as {number, string}
>    - bytes encodes as {string}
>    - string encodes as {string}
>    - any enum encodes as {string}
>    - any map encodes as {object}
>    - any record encodes as {object}
>
> A union is considered *unambiguous* if the JSON type sets for all the
> members of the union form mutually disjoint sets.
>
> Note that float and double are considered ambiguous with respect to string
> because in the future, Avro might support encoding NaN and infinity values
> as strings.
>
> WDYT?
>
> 2) Make decimal an avro first class type. Current logical type approach is
>> not natural in JSON. (see https://issues.apache.org/jira/browse/AVRO-2164
>> ).
>>
>
>> For 1.9.x    2) is probably a non-starter
>>
>
> Yes, this sounds a bit out of scope to me. It would be nice if decimal
> values were represented as a human-readable decimal number (possibly a JSON
> string to survive round-trips), but that should perhaps be part of a larger
> change to improve decimal support in general. Interestingly, if we were to
> be able to represent decimal values as JSON numbers (for example when
> they're unambiguously representable as such), that would fit fine with the
> above description, because bytes would be considered ambiguous with respect
> to float.
>
>   cheers,
>     rog.
>

Reply via email to