On Wed, Apr 9, 2008 at 10:05 AM, Ian Bicking <[EMAIL PROTECTED]> wrote: > I strongly prefer we stick to the conventional names of > dump/dumps/load/loads, for consistency with other serialization libraries > already in Python. >
On Wed, Apr 9, 2008 at 10:27 AM, Benji York <[EMAIL PROTECTED]> wrote: > +1 > On Wed, Apr 9, 2008 at 10:28 AM, Duncan McGreggor <[EMAIL PROTECTED]> wrote: > +1 for me too. > PEP updated to use dump/dumps/load/loads On Wed, Apr 9, 2008 at 11:38 AM, Alan Kennedy <[EMAIL PROTECTED]> wrote: > Answer #2: I'm working (i.e. day job) with JSON at the moment: a > javascript client talking to a java server. The JS guy had a problem > last week with a sample JSON document I gave him to prototype on. I > wrote the sample by hand (it later became my freemarker template), and > so inadvertently left in a hard-to-spot dangling comma, from all the > copying and pasting. That broke his javascript library; he solved the > problem by passing it through a PHP JSON codec on his local Apache. It > worked, i.e. his problem disappeared, but he didn't know why (the PHP > lib had eliminated the dangling comma). Which all goes to confirm, > IMHO, that you should be liberal in what you consume and strict in > what you produce. > Sounds like a case *for* strict parsing, in my opinion. PHP's loose parsing made it difficult to figure out why the JSON was invalid. If trailing comma handling is to try to work around copy-paste errors, -1 from me. > I'm beginning to think that any putative JSON API should permit the > user to specify which class will be used to instantiate JSON objects. > If the users can specify their own classes, that might go a long way > way resolve issues such as "I need my javascript client to communicate > Numbers representing radians to my python server which uses Decimal > because it works better with my geo-positioning library". Standard > libraries should provide their own set of default instantiation > classes, which the user could override. > This is the float v. Decimal thing again -- load(s) might grow a parameter for that, since it's hard to be both fast and correct. But what is the use case for overriding the mappings for other JSON types, like arrays or objects? If given the choice, I'd rather have a very simple API in the stdlib that can be wrapped or implemented by third parties if they need something weird, than a large API that is difficult to implement fully.
PEP: XXX Title: A JSON handling library Version: $Revision$ Last-Modified: $Date$ Author: John Millikin <[EMAIL PROTECTED]> Discussions-To: web-sig@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 05-Apr-2008 Python-Version: 2.6 Abstract ======== This PEP describes a proposed library for parsing and generating data in the `JSON` [1]_ format. JSON stands for "JavaScript Object Notation", and is described by RFC 4627 [2]_. Rationale ========= JSON is a widely-used data interchange format, often used for sending data to and from a web browser using Javascript. Its simplicity and ease of use has lead to various implementations with varying degrees of compliance to the RFC. By bundling a capable implementation in Python's standard library, I hope to reduce or eliminate the need for choosing a JSON library. Existing Public libraries ========================= * Bob Ippolito's simplejson [3]_ * Deron Meranda's demjson [4]_ * John Millikin's jsonlib [5]_ * Alan Kennedy mentioned on web-sig [6]_ that he has written an implementation for Jython, named jyson, but has not released the source code. Each of these have different APIs, different degrees of strictness, and different qualities of error handling. Module Interface ================ Parsing ------- Encoding Autodetection '''''''''''''''''''''' The RFC requires that JSON is encoded in one of the Unicode encodings. Because the first two bytes in a valid JSON expression are always from the ASCII set, it is possible to reliably determine the encoding of input data. Functions for autodetecting encoding exist in jsonlib and demjson. Parsing API ''''''''''' A JSON expression may be parsed using the ``load`` or ``loads`` functions:: load (file) loads (bytes_or_string) If the input is encoded as a byte stream, the encoding should be auto-detected as above. If input has been recieved in a non-standard encoding, it can be manually decoded and passed to ``parse`` as a string. The return value is either a sequence or mapping, depending on the input. Serialization ------------- Python objects may be serialized using the ``dump`` and ``dumps`` functions:: dump (obj, file, indent = None, ascii_only = True, encoding = 'utf-8') dumps (obj, indent = None, ascii_only = True, encoding = 'utf-8') ``indent`` is used to control pretty-printing. If ``None``, no pretty printing will be performed and the output will be maximally compact. If ``indent`` is a string, that string will be used for indenting nested values. The only values allowed in ``indent`` are those that are valid JSON whitespace; these are U+0009, U+000A, U+000D, and U+0020. ``ascii_only`` controls whether the output may contain characters above the ASCII set. If ``True``, all non-ASCII characters must be escaped using \\uXXXX syntax. Otherwise, non-ASCII characters will be included without escaping. Depending on the output encoding and values of the characters, this might be more size-efficient. ``encoding`` specifies how the output is to be encoded. If ``None``, the output will be a Unicode string. By default, JSON is encoded in UTF-8. If the encoding is ``None`` for ``dump()``, the file object must accept unicode arguments to ``write()``. Note: this is the set of options generally supported by implementations. For a full treatment of other options, see `Options for Serialization`_. Other ----- XXX Should the encoding autodetection function be a part of the public API? Issues ====== Representation of Fractional Numbers ------------------------------------ The author of jsonlib feels that fractional numbers should be parsed into an instance of ``decimal.Decimal``, to avoid issues with values that cannot be represented exactly by the ``float`` type [7]_. The spec does not require a decimal, but I dislike losing information in the parsing stage. Any implementation in the standard library should, in my opinion, at least offer a parameter for lossless parsing of number values. The author of simplejson disagrees [8]_, saying that: Practically speaking I've tried using decimal instead of float for JSON and it's generally The Wrong Thing To Do. The spec doesn't say what to do about numbers, but for proper JavaScript interaction you want to do things that approximate what JS is going to do: 64-bit floating point. demjson appears to have some sort of float precision detection mechanism, and returns instances of ``float`` only if they can represent a value exactly. Serializing User-defined Types ------------------------------ There should be some way for a user to specify how types not known to the JSON library should be serialized. For example, django needs to serialize types related to date and time. * simplejson supports a ``default`` parameter to ``dump`` and ``dumps``, which should be a callable that accepts a value and returns a serializable object. * demjson supports a ``json_equivalent`` method of objects to encode, or users may subclass the ``demjson.JSON`` class and override the ``encode_default`` method. * jsonlib supports an ``on_unknown`` parameter to ``write``, which acts like simplejson's ``default``. * Alan Kennedy's implementation checks for a __json__ method of objects to serialize [6]_. Options for Serialization ------------------------- There are options supported by only a few of the implementations: ``allow_nan`` In ``simplejson``, allows Infinity and NaN to be serialized. These values are not supported by JSON, but are supported in JavaScript. ``check_circular`` In ``simplejson``, allows the check for self-referential containers to be disabled. ``coerce_keys`` In ``jsonlib``, forces non-string mapping keys to strings. ``default`` In ``simplejson``, provides a hook for serializing user-defined types. ``indent`` In ``simplejson``, an integer specifying the indentation level in spaces. ``on_unknown`` In ``jsonlib``, serves the same purpose as simplejson's ``default``. ``separators`` In ``simplejson``, allows the user to override the separators used for delimiting array and object values. There is no check performed as to whether this would produce invalid JSON. I think having this parameter is insane. ``skipkeys`` In ``simplejson``, skips serializing mapping items with non-string keys. ``sort_keys`` In ``jsonlib``, sorts mapping keys to provide consistent output for unit testing. ``strict`` In ``demjson``, serves the same purpose as simplejson's ``allow_nan``. Non-string Object Keys ---------------------- JSON allows only strings to be used as object keys. demjson in loose mode allows non-string keys to be parsed, and simplejson will automatically coerce some types to strings. simplejson has an option for skipping non-string keys, and jsonlib has an option for coercing them. "Raw" atoms ----------- JSON expressions must have an array or object as the outer-most value -- that is, the expressions ``true``, ``42``, and ``"spam"`` are not valid JSON. Strict-mode demjson and jsonlib raise exceptions when parsing or generating such an expression, simplejson does not. This "feature" is widely supported, but it might just be a non-obvious bug. Trailing Commas --------------- The text ``[1, 2, 3,]`` is valid in both JavaScript and Python, but is invalid JSON. In JavaScript, this is an array of length four with the items ``[1, 2, 3, undefined]``. In Python, it is a list of three items. jyson [9]_ and loose-mode demjson accept arrays with trailing commas, with Python semantics. Strict-mode demjson, jsonlib, and simplejson raise exceptions. Module Name ----------- Probably ``json``, but there's been no actual discussion or consensus on it that I know of. Lint for JSON ------------- demjson comes with lint-like functionality. It would be nice to have this available in the standard library as well, so that invalid JSON could be detected without having to actually parse it. Resources ========= * `Comparing JSON modules for Python`__, by Deron Meranda. __ http://deron.meranda.us/python/comparing_json_modules/ References ========== .. [1] Introducing JSON, contains general description of JSON and a list of implementations. (http://json.org/) .. [2] RFC 4627 (http://www.ietf.org/rfc/rfc4627.txt) .. [3] http://pypi.python.org/pypi/simplejson/ .. [4] http://pypi.python.org/pypi/demjson/ .. [5] http://pypi.python.org/pypi/jsonlib/ .. [6] http://mail.python.org/pipermail/web-sig/2008-March/003332.html .. [7] http://mail.python.org/pipermail/web-sig/2008-March/003343.html .. [8] http://mail.python.org/pipermail/web-sig/2008-March/003336.html .. [9] http://mail.python.org/pipermail/web-sig/2008-April/003383.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
_______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com