On Tue, Mar 15, 2011 at 08:35:29PM -0600, Dan McRae wrote: > Forgive me if this is a rookie user error, but I ran across an odd > situation with respect to the libxml2 parser. It appears that when > it is parsing a series of numerical values separated by whitespace > (e.g. "... 293.18 218.92 289.13 ..."), it is possible for it to grab > just a portion of the series resulting in a number being separated > from the rest of its digits.
I not sure what you mean, libxml2 doesn't look in the content except for the specific case where type validation is done using XSD or RNG. [...] > It appears to be completely random as to whether the split happens > within a value or not, except that the array ends with character > number 250 (suspiciously close to 256). The more white space > characters, the less likely it is to happen within a value, so I can > reformat the numbers and make the error go away. However, this > doesn't leave me with a very comfortable feeling. Plus, I have a > hard time believing that we wouldn't have seen this a long time ago > if it was truly this random. > > Is this something with which you are familiar? I searched the web > but didn't see any returns that seems to address this situation. I'm > wondering if there's a way to instruct libxml2 to not separate > contiguous non-whitespace characters. I hope I don't have to try to > reassemble separated values myself. What are you doing exactly ? If you're parsing with SAX the caracters coming from a single node may come in multiple callbacks, the API will never garantee you get everything in a single chunk, you have to reassemble. That's the principle of the streaming API, you could have a single text node of a terabyte, and libvirt SAX will parse it using constant memory but you will have to do data-analysis on the fly. If you find SAX too hard use the Reader, I discourage use of SAX except for very specific kind of processing. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [email protected] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
