Re: [xml] Better hash function for dict.c

2008-08-07 Thread Daniel Veillard
On Wed, Aug 06, 2008 at 09:56:22PM +0200, Stefan Behnel wrote: Stefan Behnel wrote: Daniel Veillard wrote: if you have a bit of time then, maybe you can rerun your initial tests with that one, is that possible ? I can try, sure. Just send me a patch that removes the current hash

Re: [xml] Better hash function for dict.c

2008-08-06 Thread Stefan Behnel
Salut Daniel, Daniel Veillard wrote: - the second one is unfortunately not fixeable as is it comes from the key hash definitions themselves: -#define xmlDictComputeKey(dict, name, len) \ -(((dict)-size == MIN_DICT_SIZE) ? \ -

Re: [xml] Better hash function for dict.c

2008-08-06 Thread Daniel Veillard
On Wed, Aug 06, 2008 at 08:25:14AM +0200, Stefan Behnel wrote: Hmm, was that in my patch? Out of the top of my head, shouldn't the last line read xmlDictComputeBigKey(prefix, -1, xmlDictComputeBigKey(name, len, len or something in that line? This looks like a copypaste error to

Re: [xml] Better hash function for dict.c

2008-08-06 Thread Daniel Veillard
On Wed, Aug 06, 2008 at 04:11:13AM -0400, Daniel Veillard wrote: On Wed, Aug 06, 2008 at 08:25:14AM +0200, Stefan Behnel wrote: Another option I looked at is the 'One-at-a-Time Hash' from http://burtleburtle.net/bob/hash/doobs.html , looking at the criterias and the results it looks like a

Re: [xml] Better hash function for dict.c

2008-08-06 Thread Stefan Behnel
Daniel Veillard wrote: Another option I looked at is the 'One-at-a-Time Hash' from http://burtleburtle.net/bob/hash/doobs.html , looking at the criterias and the results it looks like a good hash too, not too expensive and should work well. The page says it's pretty good when inlined, which

Re: [xml] Better hash function for dict.c

2008-08-06 Thread Stefan Behnel
Stefan Behnel wrote: Daniel Veillard wrote: if you have a bit of time then, maybe you can rerun your initial tests with that one, is that possible ? I can try, sure. Just send me a patch that removes the current hash function from SVN and adds the new one, and I will find a way to compare

Re: [xml] Better hash function for dict.c

2008-04-22 Thread Daniel Veillard
On Sun, Apr 20, 2008 at 10:51:38AM +0200, Stefan Behnel wrote: Hi again (and sorry for all the noise), Stefan Behnel wrote: If an application benefits from a different hash function depends on the vocabulary it uses in its XML files. A slow but well distributing hash function performs

Re: [xml] Better hash function for dict.c

2008-04-22 Thread Daniel Veillard
On Sat, Apr 19, 2008 at 06:59:33PM +0200, Stefan Behnel wrote: Hi, Daniel Veillard wrote: On Thu, Apr 17, 2008 at 10:05:03AM -0400, Daniel Veillard wrote: Since you seems to be interested in the performances of the hash algorithm, I tried to drop the string comparisons on lookup when

Re: [xml] Better hash function for dict.c

2008-04-22 Thread Aron Stansvik
2008/4/22, Daniel Veillard [EMAIL PROTECTED]: On Sun, Apr 20, 2008 at 10:51:38AM +0200, Stefan Behnel wrote: Hi again (and sorry for all the noise), Stefan Behnel wrote: If an application benefits from a different hash function depends on the vocabulary it uses in its XML files.

Re: [xml] Better hash function for dict.c

2008-04-20 Thread Stefan Behnel
Hi again (and sorry for all the noise), Stefan Behnel wrote: If an application benefits from a different hash function depends on the vocabulary it uses in its XML files. A slow but well distributing hash function performs much better for large vocabularies (or many different vocabularies),

Re: [xml] Better hash function for dict.c

2008-04-19 Thread Stefan Behnel
Hi, Daniel Veillard wrote: On Thu, Apr 17, 2008 at 10:05:03AM -0400, Daniel Veillard wrote: Since you seems to be interested in the performances of the hash algorithm, I tried to drop the string comparisons on lookup when possible I have an old patch for this which I'm enclosing, but I

Re: [xml] Better hash function for dict.c

2008-04-19 Thread Stefan Behnel
Hi, Daniel Veillard wrote: On Wed, Apr 16, 2008 at 10:53:04PM +0200, Stefan Behnel wrote: I would prefer to see benchmarks done with xmllint directly, to avoid side effect of more string interning than libxml2. Ok, I did some testing with xmllint. I noticed that things can easily get slower

Re: [xml] Better hash function for dict.c

2008-04-18 Thread Stefan Behnel
Hi Daniel, Daniel Veillard wrote: On Thu, Apr 17, 2008 at 10:05:03AM -0400, Daniel Veillard wrote: Since you seems to be interested in the performances of the hash algorithm, I tried to drop the string comparisons on lookup when possible I have an old patch for this which I'm enclosing,

[xml] Better hash function for dict.c

2008-04-17 Thread Stefan Behnel
Hi, long mail, bottom line being: 30% to multiple times faster parsing with a different hash function in dict.c. Keep reading. I did some profiling on lxml using callgrind. lxml configures the parser to use a global per-thread dictionary, so everything it parses ends up in one dictionary instead

Re: [xml] Better hash function for dict.c

2008-04-17 Thread Ralf Junker
Stefan Behnel wrote: The main impact should be on the parser, and only if you use a dictionary for parsing. Could you give a simple example for dictionary parsing? Or does, for example, xmlTextReader use the dictionary automatically? Ralf ___ xml

Re: [xml] Better hash function for dict.c

2008-04-17 Thread Daniel Veillard
On Thu, Apr 17, 2008 at 03:15:18PM +0200, Ralf Junker wrote: Stefan Behnel wrote: The main impact should be on the parser, and only if you use a dictionary for parsing. Could you give a simple example for dictionary parsing? Or does, for example, xmlTextReader use the dictionary

Re: [xml] Better hash function for dict.c

2008-04-17 Thread Daniel Veillard
On Wed, Apr 16, 2008 at 10:53:04PM +0200, Stefan Behnel wrote: Hi, Hi Stefan, long mail, bottom line being: 30% to multiple times faster parsing with a different hash function in dict.c. Keep reading. I did some profiling on lxml using callgrind. lxml configures the parser to use a

Re: [xml] Better hash function for dict.c

2008-04-17 Thread Daniel Veillard
On Thu, Apr 17, 2008 at 10:05:03AM -0400, Daniel Veillard wrote: Since you seems to be interested in the performances of the hash algorithm, I tried to drop the string comparisons on lookup when possible I have an old patch for this which I'm enclosing, but I never applied it since I had

Re: [xml] Better hash function for dict.c

2008-04-17 Thread Stefan Behnel
Hi, Stefan Behnel wrote: http://www.azillionmonkeys.com/qed/hash.html is quite short and readable and seems to do what I was looking for. Some more real-world numbers. I used lxml to parse - using xmlCtxtParseFile() - all .xml and .xsd files that locate found on my hard disc, some 58000 files,