On Sun, Apr 20, 2008 at 10:51:38AM +0200, Stefan Behnel wrote:
> Hi again (and sorry for all the noise),
> 
> Stefan Behnel wrote:
> > If an application benefits from a different hash function depends on the
> > vocabulary it uses in its XML files. A slow but well distributing hash
> > function performs much better for large vocabularies (or many different
> > vocabularies), while small vocabularies will not fill the dict enough to 
> > make
> > a difference, in which case the faster hash function wins.
> 
> So the obvious solution is to combine the two. Here is a patch that uses the

  Right, that's the right approach, optimize differently on the two end
of the spectrum.

> original hash function to start with (but lowers the bucket fill limit a
> little from 4 down to 3) and when it reaches the grow barrier for the first
> time, switches to the new hash function. You will find a performance
> comparison below, based on xmllint.
> 
> I decreased the bucket fill barrier for two reasons: to trigger an early
> switch between the two functions, and because the second function has much
> better load balancing, so a high bucket size in one place really means that
> most buckets are at least close to that fill rate. As you can see from the

  That's reasonnable, yes, no problem

> numbers, it works pretty well over a wide range of vocabulary sizes from small
> to medium, and as I've shown before, it performs much (much!) better for
> larger sizes.
> 
> BTW, Bob Jenkins did a comparison of a couple of hash functions, including the
> additive hash (a variant of which is currently used) and the hash function
> used in the patch.
> 
> http://burtleburtle.net/bob/hash/doobs.html
> 
> The hash function itself was written by Paul Hsieh and published on his web
> site. According to Bob Jenkins, it's public domain (although I didn't ask
> directly).
> 
> http://www.azillionmonkeys.com/qed/hash.html
> 
> Any objections to getting this patch merged?

  None, looks very good, thanks a lot for assembling all of this and providing
full and convincing numbers :-)

  I will apply this to SVN head now,

  thanks again !

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to