2008/4/22, Daniel Veillard <[EMAIL PROTECTED]>:
> On Sun, Apr 20, 2008 at 10:51:38AM +0200, Stefan Behnel wrote:
>  > Hi again (and sorry for all the noise),
>  >
>  > Stefan Behnel wrote:
>  > > If an application benefits from a different hash function depends on the
>  > > vocabulary it uses in its XML files. A slow but well distributing hash
>  > > function performs much better for large vocabularies (or many different
>  > > vocabularies), while small vocabularies will not fill the dict enough to 
> make
>  > > a difference, in which case the faster hash function wins.
>  >
>  > So the obvious solution is to combine the two. Here is a patch that uses 
> the
>
>
>   Right, that's the right approach, optimize differently on the two end
>  of the spectrum.
>
>
>  > original hash function to start with (but lowers the bucket fill limit a
>  > little from 4 down to 3) and when it reaches the grow barrier for the first
>  > time, switches to the new hash function. You will find a performance
>  > comparison below, based on xmllint.
>  >
>  > I decreased the bucket fill barrier for two reasons: to trigger an early
>  > switch between the two functions, and because the second function has much
>  > better load balancing, so a high bucket size in one place really means that
>  > most buckets are at least close to that fill rate. As you can see from the
>
>
>   That's reasonnable, yes, no problem
>
>
>  > numbers, it works pretty well over a wide range of vocabulary sizes from 
> small
>  > to medium, and as I've shown before, it performs much (much!) better for
>  > larger sizes.
>  >
>  > BTW, Bob Jenkins did a comparison of a couple of hash functions, including 
> the
>  > additive hash (a variant of which is currently used) and the hash function
>  > used in the patch.
>  >
>  > http://burtleburtle.net/bob/hash/doobs.html
>  >
>  > The hash function itself was written by Paul Hsieh and published on his web
>  > site. According to Bob Jenkins, it's public domain (although I didn't ask
>  > directly).
>  >
>  > http://www.azillionmonkeys.com/qed/hash.html
>  >
>  > Any objections to getting this patch merged?
>
>
>   None, looks very good, thanks a lot for assembling all of this and providing
>  full and convincing numbers :-)
>
>   I will apply this to SVN head now,

I just need to chime in; great work Stefan!

Aron
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to