On Tue, Mar 15, 2011 at 08:35:29PM -0600, Dan McRae wrote:
> Forgive me if this is a rookie user error, but I ran across an odd
> situation with respect to the libxml2 parser. It appears that when
> it is parsing a series of numerical values separated by whitespace
> (e.g. "... 293.18 218.92 289.13 ..."), it is possible for it to grab
> just a portion of the series resulting in a number being separated
> from the rest of its digits.

  I not sure what you mean, libxml2 doesn't look in the content
except for the specific case where type validation is done using
XSD or RNG.

[...]
> It appears to be completely random as to whether the split happens
> within a value or not, except that the array ends with character
> number 250 (suspiciously close to 256). The more white space
> characters, the less likely it is to happen within a value, so I can
> reformat the numbers and make the error go away. However, this
> doesn't leave me with a very comfortable feeling. Plus, I have a
> hard time believing that we wouldn't have seen this a long time ago
> if it was truly this random.
> 
> Is this something with which you are familiar? I searched the web
> but didn't see any returns that seems to address this situation. I'm
> wondering if there's a way to instruct libxml2 to not separate
> contiguous non-whitespace characters. I hope I don't have to try to
> reassemble separated values myself.

  What are you doing exactly ? If you're parsing with SAX the caracters
coming from a single node may come in multiple callbacks, the API will
never garantee you get everything in a single chunk, you have to
reassemble. That's the principle of the streaming API, you could have a
single text node of a terabyte, and libvirt SAX will parse it using
constant memory but you will have to do data-analysis on the fly.
If you find SAX too hard use the Reader, I discourage use of SAX except
for very specific kind of processing.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[email protected]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to