I've written a rough draft of a PEP for standard library inclusion, attached to this email. Comments/improvements welcome - I tried to leave most of the differences between modules in the "Issues" section.
PEP: XXX Title: A JSON handling library Version: $Revision$ Last-Modified: $Date$ Author: John Millikin <[EMAIL PROTECTED]> Discussions-To: web-sig@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 05-Apr-2008 Python-Version: 2.6 Post-History: XXX
Abstract ======== This PEP describes a proposed library for parsing and generating data in the `JSON` [1]_ format. JSON stands for "JavaScript Object Notation", and is described by RFC 4627 [2]_. Rationale ========= JSON is a widely-used data interchange format, often used for sending data to and from a web browser using Javascript. Its simplicity and ease of use has lead to various implementations with varying degrees of compliance to the RFC. By bundling a capable implementation in Python's standard library, I hope to reduce or eliminate the need for choosing a JSON library. Existing Public libraries ========================= * Bob Ippolito's simplejson [3]_ * Deron Meranda's demjson [4]_ * John Millikin's jsonlib [5]_ * Alan Kennedy mentioned on web-sig [6]_ that he has written an implementation for Jython, but I couldn't find source code for it. Each of these have different APIs, different degrees of strictness, and different qualities of error handling. Module Interface ================ Parsing ------- Encoding Autodetection '''''''''''''''''''''' The RFC requires that JSON is encoded in one of the Unicode encodings. Because the first two bytes in a valid JSON expression are always from the ASCII set, it is possible to reliably determine the encoding of input data. Functions for autodetecting encoding exist in jsonlib and demjson. Parsing API ''''''''''' A JSON expression may be parsed using the ``parse`` function:: parse (bytes_or_string) If the input is a ``bytes`` object, the encoding should be auto-detected as above. If input has been recieved in a non-standard encoding, it can be manually decoded and passed to ``parse`` as a string. The return value is either a sequence or mapping, depending on the input. Serialization ------------- Python objects may be serialized using the ``generate`` function:: generate (obj, indent = None, ascii_only = True, encoding = 'utf-8') ``indent`` is used to control pretty-printing. If ``None``, no pretty printing will be performed and the output will be maximally compact. If ``indent`` is a string, that string will be used for indenting nested values. The only values allowed in ``indent`` are those that are valid JSON whitespace; these are U+0009, U+000A, U+000D, and U+0020. ``ascii_only`` controls whether the output may contain characters above the ASCII set. If ``True``, all non-ASCII characters must be escaped using \\uXXXX syntax. Otherwise, non-ASCII characters will be included without escaping. Depending on the output encoding and values of the characters, this might be more size-efficient. ``encoding`` specifies how the output is to be encoded. If ``None``, the output will be a Unicode string. By default, JSON is encoded in UTF-8. Note: this is the set of options generally supported by implementations. For a full treatment of other options, see `Options for Serialization`_. Other ----- XXX Should the encoding autodetection function be a part of the public API? Issues ====== Representation of Fractional Numbers ------------------------------------ The author of jsonlib feels that fractional numbers should be parsed into an instance of ``decimal.Decimal``, to avoid issues with values that cannot be represented exactly by the ``float`` type [7]_. The spec does not require a decimal, but I dislike losing information in the parsing stage. Any implementation in the standard library should, in my opinion, at least offer a parameter for lossless parsing of number values. The author of simplejson disagrees [8]_, saying that: Practically speaking I've tried using decimal instead of float for JSON and it's generally The Wrong Thing To Do. The spec doesn't say what to do about numbers, but for proper JavaScript interaction you want to do things that approximate what JS is going to do: 64-bit floating point. demjson appears to have some sort of float precision detection mechanism, and returns instances of ``float`` only if they can represent a value exactly. Serializing User-defined Types ------------------------------ There should be some way for a user to specify how types not known to the JSON library should be serialized. For example, django needs to serialize types related to date and time. * simplejson supports a ``default`` parameter to ``dump`` and ``dumps``, which should be a callable that accepts a value and returns a serializable object. * demjson supports a ``json_equivalent`` method of objects to encode, or users may subclass the ``demjson.JSON`` class and override the ``encode_default`` method. * jsonlib supports an ``on_unknown`` parameter to ``write``, which acts like simplejson's ``default``. * Alan Kennedy's implementation checks for a __json__ method of objects to serialize [6]_. Options for Serialization ------------------------- There are options supported by only a few of the implementations: ``allow_nan`` In ``simplejson``, allows Infinity and NaN to be serialized. These values are not supported by JSON, but are supported in JavaScript. ``check_circular`` In ``simplejson``, allows the check for self-referential containers to be disabled. ``coerce_keys`` In ``jsonlib``, forces non-string mapping keys to strings. ``default`` In ``simplejson``, provides a hook for serializing user-defined types. ``indent`` In ``simplejson``, an integer specifying the indentation level in spaces. ``on_unknown`` In ``jsonlib``, serves the same purpose as simplejson's ``default``. ``separators`` In ``simplejson``, allows the user to override the separators used for delimiting array and object values. There is no check performed as to whether this would produce invalid JSON. I think having this parameter is insane. ``skipkeys`` In ``simplejson``, skips serializing mapping items with non-string keys. ``sort_keys`` In ``jsonlib``, sorts mapping keys to provide consistent output for unit testing. ``strict`` In ``demjson``, serves the same purpose as simplejson's ``allow_nan``. Non-string Object Keys ---------------------- JSON allows only strings to be used as object keys. demjson in loose mode allows non-string keys to be parsed, and simplejson will automatically coerce some types to strings. simplejson has an option for skipping non-string keys, and jsonlib has an option for coercing them. "Raw" atoms ----------- JSON expressions must have an array or object as the outer-most value -- that is, the expressions ``true``, ``42``, and ``"spam"`` are not valid JSON. Strict-mode demjson and jsonlib raise exceptions when parsing or generating such an expression, simplejson does not. This "feature" is widely supported, but it might just be a non-obvious bug. Trailing Commas --------------- The text ``[1, 2, 3,]`` is valid in both JavaScript and Python, but is invalid JSON. In JavaScript, this is an array of length four with the items ``[1, 2, 3, undefined]``. In Python, it is a list of three items. Alan Kennedy mentioned that his parser has an option to support reading these, so presumably he has a use case for it. He didn't mention what it was parsed as. Function Names -------------- There is no real agreement on what the public functions should be named. simplejson uses load[s] and dump[s], modeled after the ``pickle`` module. demjson uses ``decode`` and ``encode``. jsonlib uses ``read`` and ``write``, modeled after the ``python-json`` module. This PEP uses ``parse`` and ``generate`` because that is what the ``email`` module uses. Module Name ----------- Probably ``json``, but there's been no actual discussion or consensus on it that I know of. Lint for JSON ------------- demjson comes with lint-like functionality. It would be nice to have this available in the standard library as well, so that invalid JSON could be detected without having to actually parse it. Resources ========= * `Comparing JSON modules for Python`__, by Deron Meranda. __ http://deron.meranda.us/python/comparing_json_modules/ References ========== .. [1] Introducing JSON, contains general description of JSON and a list of implementations. (http://json.org/) .. [2] RFC 4627 (http://www.ietf.org/rfc/rfc4627.txt) .. [3] http://pypi.python.org/pypi/simplejson/ .. [4] http://pypi.python.org/pypi/demjson/ .. [5] http://pypi.python.org/pypi/jsonlib/ .. [6] http://mail.python.org/pipermail/web-sig/2008-March/003332.html .. [7] http://mail.python.org/pipermail/web-sig/2008-March/003343.html .. [8] http://mail.python.org/pipermail/web-sig/2008-March/003336.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com