[Christian]
thanks Tim, for the partial enlightenment. :) Unfortunately my
BTree/C-wisdom is much much smaller than yours, so I got to check a
couple of assumptions over here. :)

Yup, it really helps if you have a 64-bit box to check them on.

[Christian]
Hah. Looks like BTrees can accept 2**31 on machines where maxint is a
larger ...

[Tim]
Note that "accept" in this case means "silently loses the
most-significant bits", and that's a BTree bug on platforms where
sizeof(long) > sizeof(int):

   http://www.zope.org/Collectors/Zope/1592

I was able to reproduce this on an Intel 64-Bit machine (EM64T) running
Linux and gcc.

And I expect the same will happen on any 64-bit platform other than
Win64 (which is unique, AFAICT, in leaving sizeof(long) == 4 instead
of boosting it to 8).

For one: I didn't see any compiler warnings. That sounds bad, right?

The underlying BTree bug is assignments of the form (usually "hidden"
in macro expansions):

   some_C_int = some_C_long;

When sizeof(int) < sizeof(long), that can silently lose information.
I was really hoping that major compilers on boxes where sizeof(int) <
sizeof(long) would warn about that.  Oh well.

The problem arises because what _Python_ calls "int" is what C calls
"long", but the I-flavor BTree code stores C "int", and C "int"
doesn't correspond to any Python type (except "by accident" on 32-bit
boxes).  C "int" is 4 bytes on all known current 32- and 64-bit
platforms, but the size of what Python calls "int" varies.  The BTree
code isn't aware of the possible mismatch, storing Python "int" (C
"long") into C "int" without any checking.

2**31 doesn't actually lose any bits when you store it, but it will
probably misinterpret the high-order data bit as the sign bit when you
fetch it again, magically changing it into -2**31.

I can store 2**31 in the BTree, but the keys() method will tell you that
it actually stored -2**31.

Right.  If it's not clear, this is because the _bit_ pattern

   0x80000000

is 2**31 when viewed as an 8-byte C long, but is -2**31 when viewed as
a 4-byte C long.  If you had, e.g., stored 2**32 instead, you would
have gotten 0 back when you fetched it (the top 32 bits are simply
thrown away).

...
Did you see my simpler suggestion for fixing the underlying bug (it
was a one-liner change to the original code)?  When you get tired of
fighting the 64-bit BTree bug here (it will be a minor miracle if the
test actually passes now on a 64-bit box, despite all you've tried
...), look that up ;-)

Nope, I didn't find your one-liner. :) Can you post it explicitly for my
blind eyes?

Here; it was a reply to one of the checkin messages:

   http://mail.zope.org/pipermail/checkins/2006-June/002395.html

The key problem in the original intid code is that it used randint()
instead of randrange(), theoretically allowing 2**31 to be a return
value.  It's _almost_ enough just to use randrange() instead.
Unfortunately, that's not quite enough; see the msg for what is.

It _could_ be fixed by adding 2 characters to the original, changing
2**31 to 2**31-2 (or to 0x7ffffffe), but that would leave it pretty
mysterious ;-)

I think I could have made up something that made it work, but I started
looking into making the BTree behave sanely.

My idea was, to modify the BTree code in a way that it actually checks
after the type cast whether the casted value is equal to the requested
key, or alternatively try making the CHECK_KEY macro do an "exact type
match" instead of allowing subclasses. But that wouldn't work either as
2**31 is still an int on those platforms. I'm a bit puzzled now. :)

There's potential silent information loss for both Ix-flavor BTree
keys and xI-flavor BTree values.  There are two robust ways to check
("robust" means that Python's C code has used these ways at various
times for many years now without problems).  Complain if:

   some_C_long < INT_MIN || some_C_long > INT_MAX

or complain if (this sounds close to what you had in mind above, and
is my favorite):

   (long)(int)some_C_long != some_C_long

Because the problem is due to bogus assumptions about the relationship
between C types, it's not going to help to examine Python's idea of
type.

Checking isn't needed (can't fail) if SIZEOF_INT == SIZEOF_LONG
(Python.h supplies definitions for those macros), so there's some
worth to skipping checks when that's not true.  Unfortunately, C
doesn't allow "#if" preprocessor statements _inside_ macro expansions,
so the best way to do that isn't immediately clear.

In short, irritiating little issues abound :-(.  That's why I couldn't
make time to fix it (relatively high cost with no benefit on most Zope
platforms).

Note that if ZODB moves to 64-bit Ix/xI BTrees on all boxes (IIRC, Jim
and Fred were agitating in that direction, but suffered massive
short-sighted ;-) opposition), the BTree problem would go away by
magic (C "int" would no longer be the type of Ix keys or xI values).
_______________________________________________
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com

Reply via email to