Re: Absolutely horrible default string hashing

2009-05-04 Thread Sean Kelly
Andrei Alexandrescu wrote: I have exchanged email with Paul Hsieh about using his hash function in Phobos (azillionmonkeys.com) and he said his license is very permissive. I seem to recall I have introduced his hash function, with credit, in Phobos' old runtime but I am not sure whether Sean

Re: Absolutely horrible default string hashing

2009-05-04 Thread Steven Schveighoffer
On Sun, 03 May 2009 08:10:43 -0400, Frits van Bommel wrote: Jarrett Billingsley wrote: Here's something bizarre: porting this example to Tango yields 7 unique hashes, which is right in line with your estimate. But I think it's bizarre since, well, shouldn't druntime use _the same typein

Re: Absolutely horrible default string hashing

2009-05-03 Thread Steve Teale
dsimcha Wrote: > == Quote from Steve Teale (steve.te...@britseyeview.com)'s article > > Kristian Kilpi Wrote: > > > On Sun, 03 May 2009 15:33:12 +0300, Jérôme M. Berger > > > wrote: > > > > | > > > > | foreach(c; str) > > > > | { > > > > | ret = (ret << 4) + c; > > > > | } > > > > | > > > >

Re: Absolutely horrible default string hashing

2009-05-03 Thread Andrei Alexandrescu
dsimcha wrote: == Quote from Steve Teale (steve.te...@britseyeview.com)'s article Kristian Kilpi Wrote: On Sun, 03 May 2009 15:33:12 +0300, J�r�me M. Berger wrote: | | foreach(c; str) | { | ret = (ret << 4) + c; | } | That one is very bad because it only takes into account the las

Re: Absolutely horrible default string hashing

2009-05-03 Thread dsimcha
== Quote from Steve Teale (steve.te...@britseyeview.com)'s article > Kristian Kilpi Wrote: > > On Sun, 03 May 2009 15:33:12 +0300, J�r�me M. Berger > > wrote: > > > | > > > | foreach(c; str) > > > | { > > > | ret = (ret << 4) + c; > > > | } > > > | > > > That one is very bad because it only

Re: Absolutely horrible default string hashing

2009-05-03 Thread Steve Teale
Kristian Kilpi Wrote: > On Sun, 03 May 2009 15:33:12 +0300, Jérôme M. Berger > wrote: > > | > > | foreach(c; str) > > | { > > | ret = (ret << 4) + c; > > | } > > | > > That one is very bad because it only takes into account the last > > few characters of the string (how many depends on

Re: Absolutely horrible default string hashing

2009-05-03 Thread Kristian Kilpi
On Sun, 03 May 2009 15:33:12 +0300, Jérôme M. Berger wrote: | | foreach(c; str) | { | ret = (ret << 4) + c; | } | That one is very bad because it only takes into account the last few characters of the string (how many depends on the size of the hash). However, several hashing algori

Re: Absolutely horrible default string hashing

2009-05-03 Thread Bill Baxter
This guy looks to have a pretty good hash function: http://www.azillionmonkeys.com/qed/hash.html He matched Bob Jenkins' One-at-a-time hash on a variety of tests and beat it on speed. --bb On Sun, May 3, 2009 at 5:19 AM, Kristian Kilpi wrote: > On Sun, 03 May 2009 04:59:26 +0300, BCS wrote: >>

Re: Absolutely horrible default string hashing

2009-05-03 Thread Jérôme M. Berger
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kristian Kilpi wrote: | On Sun, 03 May 2009 04:59:26 +0300, BCS wrote: |> |> IIRC something like this is common: |> |> hash_t ret = 0; |> foreach(c;str) { ret *= SomePrime; ret += c; } |> | | I think that's the basis for the best general string hashi

Re: Absolutely horrible default string hashing

2009-05-03 Thread Kristian Kilpi
On Sun, 03 May 2009 04:59:26 +0300, BCS wrote: IIRC something like this is common: hash_t ret = 0; foreach(c;str) { ret *= SomePrime; ret += c; } I think that's the basis for the best general string hashing funcs around. IIRC from the university, it doesn't matter, in practice, if the m

Re: Absolutely horrible default string hashing

2009-05-03 Thread Frits van Bommel
Jarrett Billingsley wrote: Here's something bizarre: porting this example to Tango yields 7 unique hashes, which is right in line with your estimate. But I think it's bizarre since, well, shouldn't druntime use _the same typeinfo code?_ No, Tango recently upgraded its hashing algorithms. T

Re: Absolutely horrible default string hashing

2009-05-02 Thread dsimcha
== Quote from bearophile (bearophileh...@lycos.com)'s article > dsimcha: > > we first need to fix D's string hashing.< > Ages ago I did suggest this one in the main D newsgroup: > http://www.azillionmonkeys.com/qed/hash.html > Bye, > bearophile On second thought...The real problem is just a strang

Re: Absolutely horrible default string hashing

2009-05-02 Thread Jarrett Billingsley
On Sat, May 2, 2009 at 1:00 PM, dsimcha wrote: > The result is that only about 8400 unique hashes are used, meaning 90+ % of > keys cannot be stored directly in the position corresponding to their hash. > Note that this is in full hash_t space, not in the modulus space typically > used for AAs, s

Re: Absolutely horrible default string hashing

2009-05-02 Thread dsimcha
== Quote from bearophile (bearophileh...@lycos.com)'s article > dsimcha: > > we first need to fix D's string hashing.< > Ages ago I did suggest this one in the main D newsgroup: > http://www.azillionmonkeys.com/qed/hash.html > Bye, > bearophile Yeah, upon further reverse engineering, the way strin

Re: Absolutely horrible default string hashing

2009-05-02 Thread bearophile
dsimcha: > we first need to fix D's string hashing.< Ages ago I did suggest this one in the main D newsgroup: http://www.azillionmonkeys.com/qed/hash.html Bye, bearophile

Absolutely horrible default string hashing

2009-05-02 Thread dsimcha
I've been playing around with associative array implementations in D, mostly trying to create ones that are more GC-friendly. I had written one that seemed much faster than the builtin for int and float keys but much slower for strings. I've found the reason: The builtin AAs use trees for collis