[Zope-dev] zope.keyreference hashes vs. 32/64bit

2010-08-28 Thread Hanno Schlichting
Hi.

I've recently stumbled on some at least to me unexpected behavior with
zope.keyreference. For a persistent object it generates a unique key
using:

hash((database_name, oid))

where hash is Python's built-in hash function.

Reading the documentation I assumed that a keyreference for the same
object (as identified by database name and oid) should be stable and
always produce the same result. This isn't always true, when you look
up persisted keyreference data, upgrade your software versions and
compare it to a new calculation.

Python's hash function is only stable inside the same Python version
and 32/64 bit combination. The same input in a 32bit Python 2.6 and
64bit Python 2.6 produces different results, as both try to use the
maximum available integer space and thus a 64bit Python generates keys
above the 32int range. As a simple example "hash(('main', 1)) > 2**32"
is True in a 64bit Python and False in a 32bit Python.

The internal hash implementation seems to have been pretty stable in
all the latest Python versions up to 3.1. So the algorithm produces
the same results for all 32bit version of Python 2.x to 3.1 and 64bit
respectively. But as far as I understand this isn't guaranteed to be
the case for future versions.

Does anyone else see a problem with this? Should keyreference use a
different hash algorithm?

Hanno
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] zope.keyreference hashes vs. 32/64bit

2010-08-28 Thread Jim Fulton
On Sat, Aug 28, 2010 at 12:17 PM, Hanno Schlichting  wrote:
> Hi.
>
> I've recently stumbled on some at least to me unexpected behavior with
> zope.keyreference.

Specifically, zope.keyreference.persistent, I assume.

> For a persistent object it generates a unique key
> using:
>
> hash((database_name, oid))

No, it generates a hash this way.

>
> where hash is Python's built-in hash function.
>
> Reading the documentation I assumed that a keyreference for the same
> object (as identified by database name and oid) should be stable and
> always produce the same result. This isn't always true, when you look
> up persisted keyreference data, upgrade your software versions and
> compare it to a new calculation.
>
> Python's hash function is only stable inside the same Python version
> and 32/64 bit combination. The same input in a 32bit Python 2.6 and
> 64bit Python 2.6 produces different results, as both try to use the
> maximum available integer space and thus a 64bit Python generates keys
> above the 32int range. As a simple example "hash(('main', 1)) > 2**32"
> is True in a 64bit Python and False in a 32bit Python.
>
> The internal hash implementation seems to have been pretty stable in
> all the latest Python versions up to 3.1. So the algorithm produces
> the same results for all 32bit version of Python 2.x to 3.1 and 64bit
> respectively. But as far as I understand this isn't guaranteed to be
> the case for future versions.
>
> Does anyone else see a problem with this? Should keyreference use a
> different hash algorithm?

Potentially, yes.  In current practice, I don't think so.

When a key reference is uses as a BTree key, its comparison function,
rather than it's hash is used.

If a key reference hash was used as a persistent key, then this would
definitely be a problem.

Note that in a dictionary or PersistentMapping, the hash isn't
saved persistently. The object is saves as a collection of items and the
hashes are recomputed on unpickling.

I'm in favor of someone coming up with a stable hash to
avoid future pitfalls.

It's sad that Python's hash isn't stable across Python versions
and architectures. Is this documented? If so, It's a missfeature.
If not, perhaps it should be reported as a bug.

Jim

-- 
Jim Fulton
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] zope.keyreference hashes vs. 32/64bit

2010-08-28 Thread Hanno Schlichting
Hi.

On Sat, Aug 28, 2010 at 8:47 PM, Jim Fulton  wrote:
> On Sat, Aug 28, 2010 at 12:17 PM, Hanno Schlichting  wrote:
>> I've recently stumbled on some at least to me unexpected behavior with
>> zope.keyreference.
>
> Specifically, zope.keyreference.persistent, I assume.

Yes.

>> Does anyone else see a problem with this? Should keyreference use a
>> different hash algorithm?
>
> Potentially, yes.  In current practice, I don't think so.
>
> When a key reference is uses as a BTree key, its comparison function,
> rather than it's hash is used.
>
> If a key reference hash was used as a persistent key, then this would
> definitely be a problem.
>
> Note that in a dictionary or PersistentMapping, the hash isn't
> saved persistently. The object is saves as a collection of items and the
> hashes are recomputed on unpickling.

Ah right. This makes it less likely to be a problem in practice.

> I'm in favor of someone coming up with a stable hash to
> avoid future pitfalls.
>
> It's sad that Python's hash isn't stable across Python versions
> and architectures. Is this documented? If so, It's a missfeature.
> If not, perhaps it should be reported as a bug.

The official Python documentation doesn't specify anything explicitly,
but it also doesn't describe the algoritm or state that it's stable.

You do immediately find http://effbot.org/zone/python-hash.htm
googling for "python hash" though. This notes that the algorithm
changed in Python 2.4. Looking at the NEWS file of Python, the hash
algorithm has again changed in Python 3.2 alpha 1 referencing issue
8188.

Hanno
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] zope.keyreference hashes vs. 32/64bit

2010-08-28 Thread Jim Fulton
On Sat, Aug 28, 2010 at 3:12 PM, Hanno Schlichting  wrote:
> Hi.
>
> On Sat, Aug 28, 2010 at 8:47 PM, Jim Fulton  wrote:
>> On Sat, Aug 28, 2010 at 12:17 PM, Hanno Schlichting  
>> wrote:
>>> I've recently stumbled on some at least to me unexpected behavior with
>>> zope.keyreference.
>>
>> Specifically, zope.keyreference.persistent, I assume.
>
> Yes.
>
>>> Does anyone else see a problem with this? Should keyreference use a
>>> different hash algorithm?
>>
>> Potentially, yes.  In current practice, I don't think so.
>>
>> When a key reference is uses as a BTree key, its comparison function,
>> rather than it's hash is used.
>>
>> If a key reference hash was used as a persistent key, then this would
>> definitely be a problem.
>>
>> Note that in a dictionary or PersistentMapping, the hash isn't
>> saved persistently. The object is saves as a collection of items and the
>> hashes are recomputed on unpickling.
>
> Ah right. This makes it less likely to be a problem in practice.
>
>> I'm in favor of someone coming up with a stable hash to
>> avoid future pitfalls.
>>
>> It's sad that Python's hash isn't stable across Python versions
>> and architectures. Is this documented? If so, It's a missfeature.
>> If not, perhaps it should be reported as a bug.
>
> The official Python documentation doesn't specify anything explicitly,
> but it also doesn't describe the algoritm or state that it's stable.
>
> You do immediately find http://effbot.org/zone/python-hash.htm
> googling for "python hash" though. This notes that the algorithm
> changed in Python 2.4. Looking at the NEWS file of Python, the hash
> algorithm has again changed in Python 3.2 alpha 1 referencing issue
> 8188.

If someone has the energy, I think it's worth trying to report this as a
bug. :)

Jim


-- 
Jim Fulton
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )