On Wed, Aug 06, 2008 at 08:25:14AM +0200, Stefan Behnel wrote: > Hmm, was that in my patch? Out of the top of my head, shouldn't the last line > read > > xmlDictComputeBigKey(prefix, -1, xmlDictComputeBigKey(name, len, len)))) > > or something in that line? This looks like a copy&paste error to me...
err, yes, actually we have #define xmlDictComputeKey(dict, name, len) \ (((dict)->size == MIN_DICT_SIZE) ? \ xmlDictComputeFastKey(name, len) : \ xmlDictComputeBigKey(name, len, len)) \ #define xmlDictComputeQKey(dict, prefix, name, len) \ (((prefix) == NULL) ? \ (xmlDictComputeKey(dict, name, len)) : \ (((dict)->size == MIN_DICT_SIZE) ? \ xmlDictComputeFastQKey(prefix, name, len) : \ xmlDictComputeBigKey(prefix, xmlStrlen(prefix), \ xmlDictComputeBigKey(name, len, len)))) > Anyway: > > > the problem is that basically if you compute the key for a QName > > as "a:b" you can get 2 different answers, one if you accessed it using > > "a:b" directly and hence xmlDictComputeKey() or if using "a" prefix and > > "b" name, given the algorithm the key are not the same, and it is a key > > property of the dictionary to always return the same exact pointer for > > the same string. This breaks that property. > > True, I didn't know about this property. And the 4-byte-at-once property will > really make this very hard to achieve. yes, that's the core of the problem > A way I see to fix this is to search the string for the first ':' and always > calculate the hash separately for the part before and after the ':', not > including the ':' itself. That should not break hashing namespace URIs either > (AFAIR, at the least the XML namespace gets hashed at some point). Something > like > > int len = strlen(s) > char* prefix_end = strchr(s, ':') > if (prefix_end) > h = xmlDictComputeBigKey(s, prefix_end-s, > xmlDictComputeBigKey(prefix_end+1, len-(prefix_end-s-1), > len-(prefix_end-s-1))) > else > h = xmlDictComputeBigKey(s, len, len) Another option I looked at is the 'One-at-a-Time Hash' from http://burtleburtle.net/bob/hash/doobs.html , looking at the criterias and the results it looks like a good hash too, not too expensive and should work well. I will try to make a patch using this this morning, if you have a bit of time then, maybe you can rerun your initial tests with that one, is that possible ? Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml