2008/4/22, Daniel Veillard <[EMAIL PROTECTED]>: > On Sun, Apr 20, 2008 at 10:51:38AM +0200, Stefan Behnel wrote: > > Hi again (and sorry for all the noise), > > > > Stefan Behnel wrote: > > > If an application benefits from a different hash function depends on the > > > vocabulary it uses in its XML files. A slow but well distributing hash > > > function performs much better for large vocabularies (or many different > > > vocabularies), while small vocabularies will not fill the dict enough to > make > > > a difference, in which case the faster hash function wins. > > > > So the obvious solution is to combine the two. Here is a patch that uses > the > > > Right, that's the right approach, optimize differently on the two end > of the spectrum. > > > > original hash function to start with (but lowers the bucket fill limit a > > little from 4 down to 3) and when it reaches the grow barrier for the first > > time, switches to the new hash function. You will find a performance > > comparison below, based on xmllint. > > > > I decreased the bucket fill barrier for two reasons: to trigger an early > > switch between the two functions, and because the second function has much > > better load balancing, so a high bucket size in one place really means that > > most buckets are at least close to that fill rate. As you can see from the > > > That's reasonnable, yes, no problem > > > > numbers, it works pretty well over a wide range of vocabulary sizes from > small > > to medium, and as I've shown before, it performs much (much!) better for > > larger sizes. > > > > BTW, Bob Jenkins did a comparison of a couple of hash functions, including > the > > additive hash (a variant of which is currently used) and the hash function > > used in the patch. > > > > http://burtleburtle.net/bob/hash/doobs.html > > > > The hash function itself was written by Paul Hsieh and published on his web > > site. According to Bob Jenkins, it's public domain (although I didn't ask > > directly). > > > > http://www.azillionmonkeys.com/qed/hash.html > > > > Any objections to getting this patch merged? > > > None, looks very good, thanks a lot for assembling all of this and providing > full and convincing numbers :-) > > I will apply this to SVN head now,
I just need to chime in; great work Stefan! Aron _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml