Serhiy Storchaka added the comment:
Patch updated (comment for load_binstring added).
--
Added file: http://bugs.python.org/file28097/pickle_nonportable_size_2.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
Roundup Robot added the comment:
New changeset 55fe4b57dd9c by Antoine Pitrou in branch '3.2':
Issue #12848: The pure Python pickle implementation now treats object lengths
as unsigned 32-bit integers, like the C implementation does.
http://hg.python.org/cpython/rev/55fe4b57dd9c
New changeset
Antoine Pitrou added the comment:
I've committed the latest patch (pickle_nonportable_size_2.patch). Thank you
for working on this!
--
resolution: - fixed
stage: patch review - committed/rejected
status: open - closed
___
Python tracker
Antoine Pitrou added the comment:
Here is a patch for 3.x. It unify behavior of Python and C
implementations and unify behavior on 32- and 64-bit platforms. For
backward compatibility Pickler can pickle up to 2G data, but Unpickler
can unpickle up to 4G on 64-bit.
I agree the right
Antoine Pitrou added the comment:
I'd like to add that anyone wanting to serialize large data will certainly be
using _pickle (or its ancestor cPickle), since using pickle.py is probably
excruciatingly slow. Meaning we should favour preserving _pickle/cPickle's
behaviour over preserving
Serhiy Storchaka added the comment:
The issue is not only in difference between Python and C implementations, but
also between 32-bit and 64-bit.
pickle.py on 32-bit accepts data up to 2G.
pickle.py on 64-bit accepts data up to 2G.
_pickle.c on 32-bit accepts data up to 2G.
_pickle.c on 64-bit
Martin v. Löwis added the comment:
IMO, the right solution is to finish PEP 3154, and support large strings in the
format.
For the time being, I'd claim that signed length in the existing
implementations are just a bug, and that unsigned lengths are the intended
semantics of these opcodes. I
Serhiy Storchaka added the comment:
Here is a patch for 3.x which extends supported size to 4G on 64-bit.
--
Added file: http://bugs.python.org/file28010/pickle_nonportable_size.patch
___
Python tracker rep...@bugs.python.org
Antoine Pitrou added the comment:
OTOH, I also think that it won't matter much in practive: if you try to
unpickle a string with more than 2GiB on a 32-bit system, chances are
really high that you run out of memory.
Agreed. I think this issue is mostly about 64-bit systems, even though we
Serhiy Storchaka added the comment:
Here is a patch for 3.x. It unify behavior of Python and C implementations and
unify behavior on 32- and 64-bit platforms. For backward compatibility Pickler
can pickle up to 2G data, but Unpickler can unpickle up to 4G on 64-bit.
--
keywords:
Serhiy Storchaka added the comment:
The C implementation writes and reads BINBYTES and BINUNICODE up to 4G (on
64-bit platform). The Python implementation writes and reads BINBYTES and
BINUNICODE up to 2G. What should be compatible fix? Allow the Python
implementation to write and read up to
Serhiy Storchaka added the comment:
What if just add 0x?
--
nosy: +serhiy.storchaka
versions: +Python 3.4
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
Serhiy Storchaka added the comment:
Ah, for unpacking 32-bit unsigned big-endian bytes you can use len =
int.from_bytes(self.read(4), 'big').
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
Serhiy Storchaka added the comment:
Or you can use len = struct.unpack('I', self.read(4)).
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
Alexandre Vassalotti added the comment:
pickle.py is the buggy one here. Its use of the marshal module is really a
hack. Plus, it is slower than both struct and int.from_bytes.
14:40:57 [~/cpython]$ ./python -m timeit int.from_bytes(b'\xff\xff\xff\xff',
'big')
100 loops, best of 3: 0.209
New submission from Antoine Pitrou pit...@free.fr:
In several opcodes (BINBYTES, BINUNICODE... what else?), _pickle.c happily
accepts 32-bit lengths of more than 2**31, while pickle.py uses marshal's i
typecode which means signed... and therefore fails reading the data.
Apparently, pickle.py
16 matches
Mail list logo