New submission from Dave Hibbitts:

__len__() always returns an int which on windows machines is tied to the size 
of a c long and is always 32 bits even if it's compiled for 64 bit. len() 
however returns an int for values less than sys.maxint and a long above that. 

Returning an int in __len__() causes it to return negative lengths for objects 
of size greater than sys.maxint, below you can see a quick test on how to 
reproduce it.

And here's an explanation from \u\Rhomboid on Reddit of why we believe the 
issue happens.

"You'll only see that on Windows. The issue is that, confusingly, the range of 
the Python int type is tied to the range of the C long type. On Windows long is 
always 32 bits even on x64 systems, whereas on Unix systems it's the native 
machine word size. You can confirm this by checking sys.maxint, which will be 
2**31 - 1 even with a 64 bit interpreter on Windows.

The difference in behavior of foo.__len__ vs len(foo) is that the former goes 
through an attribute lookup which goes through the slot lookup stuff, finally 
ending in Python/typeobject.c:wrap_lenfunc(). The error is casting Py_ssize_t 
to long, which truncates on Windows x64 as Py_ssize_t is a proper signed 64 bit 
integer. And then it compounds the injury by creating a Python int object with 
PyInt_FromLong(), so this is hopelessly broken. In the case of len(foo), you 
end up in Python/bltinmodule.c:builtin_len() which skips all the attribute 
lookup stuff and uses the object protocol directly, calling PyObject_Size() and 
creating a Python object of the correct type via PyInt_FromSsize_t() which 
figures out whether a Python int or long is necessary.

This is definitely a bug that should be reported. In 3.x the int/long 
distinction is gone and all integers are Python longs, but the bogus cast to a 
C long still exists in wrap_lenfunc():

    return PyLong_FromLong((long)res);

That means the bug still exists even though the reason for its existence is 
gone! Oops. That needs to be updated to get rid of the cast and call 
PyLong_FromSsize_t()."

Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul  2 2014, 15:12:11) [MSC 
v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> a = 'a'*2500000000
>>> a.__len__()
-1794967296
>>> len(a)
2500000000L
>>> a = [1]*2500000000
>>> len(a)
2500000000L
>>> a.__len__()
-1794967296

----------
components: Windows
messages: 260749
nosy: Dave Hibbitts, georg.brandl, paul.moore, steve.dower, tim.golden, 
zach.ware
priority: normal
severity: normal
status: open
title: __len__() returns 32 bit int on windows leading to overflows
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26423>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to