Re: hash issues [WAS] Re: [Tutor] hash()ing a list

jfouhy Tue, 29 Mar 2005 00:17:03 -0800

Quoting Brian van den Broek <[EMAIL PROTECTED]>:

> I had thought lookup was by hash value, and thus expected the access 
> to some_dict to cause troubles. Yet it worked. Is it that lookup is by 
> hash value, and then equality if need be so as to settle ambiguity, or 
> have I completely misunderstood the mechanism of dictionary lookup?


Hash functions are rarely perfect (in terms of mapping every input to a
different output) [1], so they have to deal with collisions.

AFAIK, the hash table lookup algorithm basically works like this:

    def lookup(self, key):
        hits = self.internalDataStructure[key]
        for key_, value in hits:
            if key_ == key:
                return value
        raise KeyError, 'Key not found'.

(except that it will be written in C and there may be advanced trickery involved
to improve efficiency in other ways)

HTH!

[1] The pigeonhole principle [2] tells us that it is rarely possible to make
perfect hash functions.  For example, there are more possible strings than there
are integers [3,4], so, in principle, you could find arbitrarily many strings
all with the same hash value.  In that case, a dictionary with them as keys
would be O(n), not O(1).  But a good hash function will make that very unlikely.
[2] http://en.wikipedia.org/wiki/Pigeonhole_principle
[3] Because every integer n has a corresponding string "n", but there are
strings which do not (directly) correspond to an integer (eg, "foo").
[4] Unless we allow integers to go to infinity... But computers generally do not
have infinite memory :-)
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: hash issues [WAS] Re: [Tutor] hash()ing a list

Reply via email to