On Wed, Apr 9, 2008 at 10:05 AM, Ian Bicking <[EMAIL PROTECTED]> wrote:
> I strongly prefer we stick to the conventional names of 
> dump/dumps/load/loads, for consistency with other serialization libraries 
> already in Python.
>

On Wed, Apr 9, 2008 at 10:27 AM, Benji York <[EMAIL PROTECTED]> wrote:
> +1
>

On Wed, Apr 9, 2008 at 10:28 AM, Duncan McGreggor
<[EMAIL PROTECTED]> wrote:
> +1 for me too.
>

PEP updated to use dump/dumps/load/loads

On Wed, Apr 9, 2008 at 11:38 AM, Alan Kennedy <[EMAIL PROTECTED]> wrote:
>  Answer #2: I'm working (i.e. day job) with JSON at the moment: a
>  javascript client talking to a java server. The JS guy had a problem
>  last week with a sample JSON document I gave him to prototype on. I
>  wrote the sample by hand (it later became my freemarker template), and
>  so inadvertently left in a hard-to-spot dangling comma, from all the
>  copying and pasting. That broke his javascript library; he solved the
>  problem by passing it through a PHP JSON codec on his local Apache. It
>  worked, i.e. his problem disappeared, but he didn't know why (the PHP
>  lib had eliminated the dangling comma). Which all goes to confirm,
>  IMHO, that you should be liberal in what you consume and strict in
>  what you produce.
>
Sounds like a case *for* strict parsing, in my opinion. PHP's loose
parsing made it difficult to figure out why the JSON was invalid. If
trailing comma handling is to try to work around copy-paste errors, -1
from me.

>  I'm beginning to think that any putative JSON API should permit the
>  user to specify which class will be used to instantiate JSON objects.
>  If the users can specify their own classes, that might go a long way
>  way resolve issues such as "I need my javascript client to communicate
>  Numbers representing radians to my python server which uses Decimal
>  because it works better with my geo-positioning library". Standard
>  libraries should provide their own set of default instantiation
>  classes, which the user could override.
>
This is the float v. Decimal thing again -- load(s) might grow a
parameter for that, since it's hard to be both fast and correct. But
what is the use case for overriding the mappings for other JSON types,
like arrays or objects? If given the choice, I'd rather have a very
simple API in the stdlib that can be wrapped or implemented by third
parties if they need something weird, than a large API that is
difficult to implement fully.
PEP: XXX
Title: A JSON handling library
Version: $Revision$
Last-Modified: $Date$
Author: John Millikin <[EMAIL PROTECTED]>
Discussions-To: web-sig@python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 05-Apr-2008
Python-Version: 2.6


Abstract
========

This PEP describes a proposed library for parsing and generating
data in the `JSON` [1]_ format. JSON stands for "JavaScript Object
Notation", and is described by RFC 4627 [2]_.

Rationale
=========

JSON is a widely-used data interchange format, often used for sending
data to and from a web browser using Javascript. Its simplicity and
ease of use has lead to various implementations with varying degrees
of compliance to the RFC. By bundling a capable implementation in
Python's standard library, I hope to reduce or eliminate the need
for choosing a JSON library.

Existing Public libraries
=========================

* Bob Ippolito's simplejson [3]_
* Deron Meranda's demjson [4]_
* John Millikin's jsonlib [5]_
* Alan Kennedy mentioned on web-sig [6]_ that he has written
  an implementation for Jython, named jyson, but has not released
  the source code.

Each of these have different APIs, different degrees of strictness,
and different qualities of error handling.

Module Interface
================

Parsing
-------

Encoding Autodetection
''''''''''''''''''''''

The RFC requires that JSON is encoded in one of the Unicode encodings.
Because the first two bytes in a valid JSON expression are always from
the ASCII set, it is possible to reliably determine the encoding of
input data. Functions for autodetecting encoding exist in jsonlib and
demjson.

Parsing API
'''''''''''

A JSON expression may be parsed using the ``load`` or ``loads`` functions::

  load (file)
  loads (bytes_or_string)

If the input is encoded as a byte stream, the encoding should be auto-detected
as above. If input has been recieved in a non-standard encoding, it can
be manually decoded and passed to ``parse`` as a string. The return
value is either a sequence or mapping, depending on the input.

Serialization
-------------

Python objects may be serialized using the ``dump`` and ``dumps``
functions::

  dump (obj, file, indent = None, ascii_only = True, encoding = 'utf-8')
  dumps (obj, indent = None, ascii_only = True, encoding = 'utf-8')

``indent`` is used to control pretty-printing. If ``None``, no pretty
printing will be performed and the output will be maximally compact.
If ``indent`` is a string, that string will be used for indenting
nested values. The only values allowed in ``indent`` are those that
are valid JSON whitespace; these are U+0009, U+000A, U+000D, and U+0020.

``ascii_only`` controls whether the output may contain characters above
the ASCII set. If ``True``, all non-ASCII characters must be escaped
using \\uXXXX syntax. Otherwise, non-ASCII characters will be included
without escaping. Depending on the output encoding and values of the
characters, this might be more size-efficient.

``encoding`` specifies how the output is to be encoded. If ``None``,
the output will be a Unicode string. By default, JSON is encoded in
UTF-8. If the encoding is ``None`` for ``dump()``, the file object must
accept unicode arguments to ``write()``.

Note: this is the set of options generally supported by implementations.
For a full treatment of other options, see `Options for Serialization`_.

Other
-----

XXX Should the encoding autodetection function be a part of the
public API?

Issues
======

Representation of Fractional Numbers
------------------------------------

The author of jsonlib feels that fractional numbers should be parsed
into an instance of ``decimal.Decimal``, to avoid issues with values
that cannot be represented exactly by the ``float`` type
[7]_.

  The spec does not require a decimal, but I dislike losing information
  in the parsing stage. Any implementation in the standard library
  should, in my opinion, at least offer a parameter for lossless parsing
  of number values.

The author of simplejson disagrees [8]_, saying that:

  Practically speaking I've tried using decimal instead of float for
  JSON and it's generally The Wrong Thing To Do. The spec doesn't say
  what to do about numbers, but for proper JavaScript interaction you
  want to do things that approximate what JS is going to do: 64-bit
  floating point.

demjson appears to have some sort of float precision detection
mechanism, and returns instances of ``float`` only if they can
represent a value exactly.

Serializing User-defined Types
------------------------------

There should be some way for a user to specify how types not known
to the JSON library should be serialized. For example, django
needs to serialize types related to date and time.

* simplejson supports a ``default`` parameter to ``dump`` and
  ``dumps``, which should be a callable that accepts a value and
  returns a serializable object.
* demjson supports a ``json_equivalent`` method of objects to
  encode, or users may subclass the ``demjson.JSON`` class and
  override the ``encode_default`` method.
* jsonlib supports an ``on_unknown`` parameter to ``write``, which
  acts like simplejson's ``default``.
* Alan Kennedy's implementation checks for a __json__ method of
  objects to serialize [6]_.

Options for Serialization
-------------------------

There are options supported by only a few of the implementations:

``allow_nan``
  In ``simplejson``, allows Infinity and NaN to be serialized. These
  values are not supported by JSON, but are supported in JavaScript.
  
``check_circular``
  In ``simplejson``, allows the check for self-referential containers
  to be disabled.
  
``coerce_keys``
  In ``jsonlib``, forces non-string mapping keys to strings.
  
``default``
  In ``simplejson``, provides a hook for serializing user-defined
  types.
  
``indent``
  In ``simplejson``, an integer specifying the indentation level in
  spaces.
  
``on_unknown``
  In ``jsonlib``, serves the same purpose as simplejson's ``default``.
  
``separators``
  In ``simplejson``, allows the user to override the separators used
  for delimiting array and object values. There is no check performed
  as to whether this would produce invalid JSON. I think having this
  parameter is insane.
  
``skipkeys``
  In ``simplejson``, skips serializing mapping items with non-string
  keys.
  
``sort_keys``
  In ``jsonlib``, sorts mapping keys to provide consistent output for
  unit testing.
  
``strict``
  In ``demjson``, serves the same purpose as simplejson's
  ``allow_nan``.

Non-string Object Keys
----------------------

JSON allows only strings to be used as object keys. demjson in loose
mode allows non-string keys to be parsed, and simplejson will
automatically coerce some types to strings. simplejson has an option
for skipping non-string keys, and jsonlib has an option for coercing
them.

"Raw" atoms
-----------

JSON expressions must have an array or object as the outer-most
value -- that is, the expressions ``true``, ``42``, and ``"spam"``
are not valid JSON. Strict-mode demjson and jsonlib raise exceptions
when parsing or generating such an expression, simplejson does not.

This "feature" is widely supported, but it might just be a non-obvious
bug.

Trailing Commas
---------------

The text ``[1, 2, 3,]`` is valid in both JavaScript and Python, but
is invalid JSON. In JavaScript, this is an array of length four with
the items ``[1, 2, 3, undefined]``. In Python, it is a list of three
items.

jyson [9]_ and loose-mode demjson accept arrays with trailing commas,
with Python semantics. Strict-mode demjson, jsonlib, and simplejson
raise exceptions.

Module Name
-----------

Probably ``json``, but there's been no actual discussion or consensus
on it that I know of.

Lint for JSON
-------------

demjson comes with lint-like functionality. It would be nice to have
this available in the standard library as well, so that invalid JSON
could be detected without having to actually parse it.

Resources
=========

* `Comparing JSON modules for Python`__, by Deron Meranda.

  __ http://deron.meranda.us/python/comparing_json_modules/

References
==========

.. [1] Introducing JSON, contains general description of JSON and a list
   of implementations.
   (http://json.org/)

.. [2] RFC 4627
   (http://www.ietf.org/rfc/rfc4627.txt)

.. [3] http://pypi.python.org/pypi/simplejson/

.. [4] http://pypi.python.org/pypi/demjson/

.. [5] http://pypi.python.org/pypi/jsonlib/

.. [6] http://mail.python.org/pipermail/web-sig/2008-March/003332.html

.. [7] http://mail.python.org/pipermail/web-sig/2008-March/003343.html

.. [8] http://mail.python.org/pipermail/web-sig/2008-March/003336.html

.. [9] http://mail.python.org/pipermail/web-sig/2008-April/003383.html

Copyright
=========

This document has been placed in the public domain.


 
..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:

_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to