[Python-ideas] json library: non-standards-compliant by default, and what to do about it.

Serge Bazanski Tue, 16 Jun 2020 11:01:23 -0700

Hi list,

as you might be aware, the json library is non standards-compliant [1]
by default: when fed {NaN, Inf, -Inf} floating point values in the
serialization input, it will output {Nan, Inf, -Inf} literals in
serialized form, unless the keyward argument allow_nan is explicitly set
to False - and by default it's set to True.


Therefore, the current state of affairs is that a simple `import json;
json.dumps(float("NaN"))` is non-standards compliant. There is a
symmetrical issue with deserialization - json.load and friends will by
default happily treat NaN/Inf/-Inf values as valid and convert them to
Python float values. However, I'd like to focus on the
emitting/encoding/serializing part, as I think that's the one that's
more problematic in practice. A parser that is by default more lenient
than the standard dictates is generally less of an issue than when a
serializer is.

A quick Google search for 'allow_nan python github' brings up many, many
examples of people being bitten by either the fact that their code was
not emitting standards-compliant JSON, or by the fact that they have to
deal with an external Python system that emits NaN/Inf/-Inf [2]. From my
experience it's not uncommon to find existing, mature Python codebases
that exhibit this issue, and I'd like future Python users to notice
early that their code is likely emitting non-compliant JSON values, and
take appropriate actions.

Is there a general consensus on this state of affairs, or some
discussion about this that I've missed? As far as I can tell, this
behavior has existed in Python (and simplejson) since at least 2005.

What does the list think of the following two ideas:

1)  Document this lack of standards compliance better - eg., introduce a
    big emphasized box on top of the Python manual for the json library
    that mentions the importance of the allow_nan flag. Or,

2)  Fix the current behavior of JSON encoding in Python with regards
    to NaN/Inf/-Inf values - keeping in mind that a simple flip of
    allow_nan to False by default would unfortunately cause obvious
    breakage to existing Python codebases (as attempts to emit out-of-
    range float values result in a ValueError being thrown).

    A discussion naturally arises on whether increased standards
    compliance is wroth the breakage of backwards compatibility, or
    whether there is a way to implement this change in a less drastic
    way (transition period with warning? defaulting to converting
    invalid values to null? something else?).

I've been sufficiently annoyed by this behavior that I'm willing to
drive either of these proposals to further discussion and possibly
implementation, but I wanted to first gauge the consensus on this, and
make sure there wasn't a previous discussion on this that I've missed.

Kind regards,
Serge Bazanski

[1] - By standards-compliant, I refer to compliance with RFC8259 - but I
couldn't find _any_ JSON specification that would allow for NaN/Inf/-Inf.

[2] - I've personally had a similar experience not so long ago, which
forced me to implement https://github.com/q3k/cursedjson - so I'm
generally somewhat biased with regards to this.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TCQZNSRM2Z5FGPJCTA6MGPHGFLM4WR4E/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] json library: non-standards-compliant by default, and what to do about it.

Reply via email to