Re: [xml] Reading CDATA
Hi Daniel, On Thursday, 25. September 2008 17:05, Daniel Veillard wrote: On Wed, Sep 17, 2008 at 06:48:37PM +0200, Hartmut Sbosny wrote: Hello, I am fresh to libxml. I want to read an xml file containing the part data ![CDATA[...]] /data Currently I use the xmlParseDoc() interface. My first try was to read the data node string via xmlNodeListGetString(). This returns something You just can't using that API. Is this a principle limitation of the xmlParseDoc API or only an accidental API lack? Which API is suitable to read cdata? Sorry for my stumbling asking, I am new to libxml and xml. From there I could in princple extract the pure ... content by subtracting the trailing white space node string, but it seems to me there should exist an easier libxml way to read the CDATA content? wei:~/XML - cat tst.xml data ![CDATA[...]] /data wei:~/XML - xmllint --debug tst.xml DOCUMENT version=1.0 URL=tst.xml standalone=true ELEMENT data TEXT compact content= CDATA_SECTION content=... TEXT compact content= wei:~/XML - navigate in the tree and grab the data as content- from the second child of your containing element I probably miss the point. Do you mean I should use the command line tool 'xmllint'? This would be rather inconvinient for me. I have ready (more or less) a C program - using the xmlParseDoc API - which reads an XML file, where the cdata content is only a small piece of. As a workaround to get the cdata I evaluate currently the three strings ___...___ ...___ ___ (in the said symbolic meaning) which I get when I parse through the children of the data element using xmlNodeListGetString(). Is that something I should not rely on? Thanks for your response Hartmut ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Reading CDATA
On Fri, Sep 26, 2008 at 10:31:36AM +0200, Hartmut Sbosny wrote: Hi Daniel, On Thursday, 25. September 2008 17:05, Daniel Veillard wrote: On Wed, Sep 17, 2008 at 06:48:37PM +0200, Hartmut Sbosny wrote: Hello, I am fresh to libxml. I want to read an xml file containing the part data ![CDATA[...]] /data Currently I use the xmlParseDoc() interface. My first try was to read the data node string via xmlNodeListGetString(). This returns something You just can't using that API. Is this a principle limitation of the xmlParseDoc API or only an accidental API lack? Which API is suitable to read cdata? Sorry for my stumbling asking, I am new to libxml and xml. No API lack. You use an API to dump the content of a LIST of node when you want the content of a SINGLE node. There is a zillion ways to get it like accessing directly the node-content pointer or using the API getting the content of a single node like xmlNodeGetContent() wei:~/XML - xmllint --debug tst.xml DOCUMENT version=1.0 URL=tst.xml standalone=true ELEMENT data TEXT compact content= CDATA_SECTION content=... TEXT compact content= wei:~/XML - navigate in the tree and grab the data as content- from the second child of your containing element I probably miss the point. Do you mean I should use the command line tool 'xmllint'? Hum, no I just tried to get you to understadn that the data model is a tree and you need to walk that tree ... and xmllint --debug is a convenient way to see this tree. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] XQuery
On Thu, 2008-09-25 at 19:06 +0200, Daniel Veillard wrote: [...] XQuery doesn't make that much sense when playing with a single document, There are several implementations of XQuery that work on a single document and that people say are useful. my feeling is that it's more fit for a set/database of documents, i.e. libxml2 might be used to implement an XQuery engine on top of some database but just for within libxml2 it's not worth it. As it stands today I don't know how good a fit libxml would be -- for XPath 2, nodes are typed, so you'd wanting to make some changes. Certainly you could use libxml's XMLReader API to build data model instances, though, and/or tie in to the W3C XML Schema validation. dbxml (which uses Quilla) is a viable alternative for many people, and the FLWOR Foundation is funding another, both in C++. If you are doing work in KDE, there's also an implementation from TrollTech (Nokia now). Liam -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net irc.gnome.org www.advogato.org ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Reading CDATA
On Fri, 26. September 2008 11:15, Daniel Veillard wrote: On Fri, Sep 26, 2008 at 10:31:36AM +0200, Hartmut Sbosny wrote: Hi Daniel, On Thursday, 25. September 2008 17:05, Daniel Veillard wrote: On Wed, Sep 17, 2008 at 06:48:37PM +0200, Hartmut Sbosny wrote: Hello, I am fresh to libxml. I want to read an xml file containing the part data ![CDATA[...]] /data Currently I use the xmlParseDoc() interface. My first try was to read the data node string via xmlNodeListGetString(). This returns something You just can't using that API. Is this a principle limitation of the xmlParseDoc API or only an accidental API lack? Which API is suitable to read cdata? Sorry for my stumbling asking, I am new to libxml and xml. No API lack. You use an API to dump the content of a LIST of node when you want the content of a SINGLE node. There is a zillion ways to get it like accessing directly the node-content pointer or using the API getting the content of a single node like xmlNodeGetContent() Ok, now I understand (I related You just can't using that API to xmlParseDoc(), not to xmlNodeListGetString()). Many thanks Hartmut ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] UTF-8 decoding bug in HTML parser
Hi Daniel, Reusing the XML code for this seems to work fine for em and the regression test, but you have probably a more extensive HTML test suite than me ;-) so raise the problem if there is a regression ! Will commit to SVN with the test case, Thanks, I'll check it out. I think this greatly helps the usability of libxml2 for parsing HTML documents. Cheers, Michael -- Print XML with Prince! http://www.princexml.com ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] UTF-8 decoding bug in HTML parser
Hi Daniel, Reusing the XML code for this seems to work fine for em and the regression test, but you have probably a more extensive HTML test suite than me ;-) so raise the problem if there is a regression ! Actually, I just remembered one more issue: null bytes in HTML documents terminate the parser, with no error or warning messages. See the attached test document, which has two paragraphs separated by a null. Best regards, Michael -- Print XML with Prince! http://www.princexml.com Hello, world! This will not appear. ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] UTF-8 decoding bug in HTML parser
On Fri, Sep 26, 2008 at 08:24:33PM +1000, Michael Day wrote: Hi Daniel, Reusing the XML code for this seems to work fine for em and the regression test, but you have probably a more extensive HTML test suite than me ;-) so raise the problem if there is a regression ! Will commit to SVN with the test case, Thanks, I'll check it out. I think this greatly helps the usability of libxml2 for parsing HTML documents. the patch doesn't work for the push parser though and if i add it to push lot of things breaks so it's not final ... Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] UTF-8 decoding bug in HTML parser
On Fri, Sep 26, 2008 at 08:29:44PM +1000, Michael Day wrote: Hi Daniel, Reusing the XML code for this seems to work fine for em and the regression test, but you have probably a more extensive HTML test suite than me ;-) so raise the problem if there is a regression ! Actually, I just remembered one more issue: null bytes in HTML documents terminate the parser, with no error or warning messages. See the attached test document, which has two paragraphs separated by a null. that's gonna be harder to handle, the zero is used in places to indicate the end of the input buffer... I don't expect something trivial there. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] libxml2-2.7.1, solaris 8, and xmlDictComputeBigKey
On Thu, Sep 25, 2008 at 6:43 PM, Daniel Veillard [EMAIL PROTECTED] wrote: On Tue, Sep 02, 2008 at 11:09:38AM -0400, Matt Goebel wrote: Hi, The changes around line 266 in dict.c which relate to xmlDictComputeBigKey, at least on solaris, require the inclusion of sys/int_types.h to pick up the defintion of uint8_t and uint16_t. hum, isn't there a more common header allowing to get those included ? Seems stdint.h should allow this and we already have #ifdef HAVE_STDINT_H #include stdint.h #elif defined(WIN32) typedef unsigned __int32 uint32_t; #endif does solaris really not have stdint.h ? I can't directly include sys/int_types.h , adding autodetect in configure.in should be possible but i would prefer to receive a tested patch in that case. Sorry for blind shot, but whether Solaris's inttypes.h doesn't includes int_types.h? If yes (includes), then we can just include inttypes.h instead of stdint.h or just both (after checking for existence by autoconf, of course) for paranoia reasons. From the Autoconf Manual: (http://www.gnu.org/software/autoconf/manual/html_node/Header-Portability.html) inttypes.h vs. stdint.h The C99 standard says that inttypes.h includes stdint.h, so there's no need to include stdint.h separately in a standard environment. Some implementations have inttypes.h but not stdint.h (e.g., Solaris 7), but we don't know of any implementation that has stdint.h but not inttypes.h. -- Andrew W. Nosenko [EMAIL PROTECTED] ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] libxml2 2.7.1 breaks XML serialisation of HTML trees
On 25/09/2008, Daniel Veillard [EMAIL PROTECTED] wrote: Stephan, Martin, could you check the enclosed patch ? I'm commiting it to SVN head too but it's probably easier to review that way. I built trunk and had a play around, this handles my use case, thanks! A couple of concerns, first it's not *totally* clear which options should be used together. Should at least be documented, and maybe clearly defined in the code as well. The patch has a bunch of additions like this: xmlSaveCtxtInit(ctxt); +ctxt.options |= XML_SAVE_AS_XML; Move that common case into xmlSaveCtxtInit and overwrite after in xmlNewSaveCtxt for the exception of wanting to save as html? Or put logic in xmlNewSaveCtxt to make the set of options on the ctxt sane? Or could the format parameter in those functions be safely upgraded to full options - the signature is the same and XML_SAVE_FORMAT is 1 anyway, but I guess if people have been passing some other non-zero value for formatting that'd break compatibility. Finally, having XML_SAVE_AS_HTML makes it seem like you could save any xml-flavoured-html document in non-xml-flavoured form, but that's not quite the case. One thing I found was that the overloading of XML_CDATA_SECTION_NODE to also be HTML_PRESERVE_NODE means the contents get output raw, see HTMLtree.c lines 838-843 in htmlNodeDumpFormatOutput. Basically it adds 3 parsing options, and for the old entry points xmlDump* not xmlSave based it forces the XML_SAVE_AS_XML bypassing the doc type in case of HTML documents. that should fix Stephan problem and also provide ways to do things with xmlSave when available. For the 'problem' of the added meta an XML_SAVE_IMMUTABLE option could be added that sounds more generic, but i'm not adding this in the patch to not complicate things. A don't-fiddle-with-the-tree option sounds like a possibility, though I do find the duplication of xml:lang to lang useful. I hope i didn't miss any old entry point which behaviour was modified in 2.7.1, and not missing places where the new flags should be checked too, This I haven't thoroughly tested, however. Thanks for working on this, Martin ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] libxml - source file list for cross compilation
I need to specify the sources that I need to build a static library with a tiny footprint. I need the following xmlparser ( non file based , by creating parser context and then using xmlparseChunk) DOM C14n xmlwriter ( ability to only write data to a buffer ) I do not need SAX SAX2 File IO ( as my environment has a different file system) exceptions debug support xpath xpointer xinclude regex xtree My goal is to get a library less than 500kb. Is this achievable with the features I need ? I would be great if I can get a list of sources (.c files) that would be needed. Its been really hard to make a minimal build taking into account the dependencies and with the fact that i cannot use ./configure Thanks in advance ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml