James Ytterstene schrieb am 23.06.2010 um 14:41 (+0200): > If i have the file unchanged from any windows editor the line ending > is CR only but if someone edit the file it will be changed to CRLF > (Stupid windows editors but we must use them) If i now try to read the > file back in libxml2 i will get an extra node at each line only > containing 0x10.
Most serious editors have an option to go with DOS or UNIX or Mac line endings. Maybe yours do, too. > If i change the xmlReadFile and add the option XML_PARSE_NOBLANKS i > can read the file back ok. But when reading about that option i find > many posts about not to use it, so im confused here. The question you have to answer: Are whitespace-only text nodes in your XML significant or not? If they're not significant, nothing wrong with stripping them. Unless, of course, your output is intended for human consumption. In that case, you have to keep them, or apply automatic output indenting. > When i read about libxml2 and how files should be parsed i get the > feeling that the parser should handle the CRLF when reading files and > always save the new files with CR only. So the extra CRLF shouIdn't be > any issue but I can be wrong here. It's a requirement of the XML spec: http://www.w3.org/TR/REC-xml/#sec-line-ends > Is there any general solution for the parsing of files so the CR CRLF > doesnt add any extra nodes? Well yes, the one you already found. Strip whitespace-only text nodes on parsing, using the appropriate parser or processor option, like in this case XML_PARSE_NOBLANKS. -- Michael Ludwig _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
