James Ytterstene schrieb am 23.06.2010 um 14:41 (+0200):

> If i have the file unchanged from any windows editor the line ending
> is CR only but if someone edit the file it will be changed to CRLF
> (Stupid windows editors but we must use them) If i now try to read the
> file back in libxml2 i will get an extra node at each line only
> containing 0x10.

Most serious editors have an option to go with DOS or UNIX or Mac line
endings. Maybe yours do, too.

> If i change the xmlReadFile and add the option XML_PARSE_NOBLANKS i
> can read the file back ok. But when reading about that option i find
> many posts about not to use it, so im confused here.

The question you have to answer: Are whitespace-only text nodes in your
XML significant or not? If they're not significant, nothing wrong with
stripping them. Unless, of course, your output is intended for human
consumption. In that case, you have to keep them, or apply automatic
output indenting.

> When i read about libxml2 and how files should be parsed i get the
> feeling that the parser should handle the CRLF when reading files and
> always save the new files with CR only. So the extra CRLF shouIdn't be
> any issue but I can be wrong here.

It's a requirement of the XML spec:

http://www.w3.org/TR/REC-xml/#sec-line-ends

> Is there any general solution for the parsing of files so the CR CRLF
> doesnt add any extra nodes?

Well yes, the one you already found. Strip whitespace-only text nodes on
parsing, using the appropriate parser or processor option, like in this
case XML_PARSE_NOBLANKS.

-- 
Michael Ludwig
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to