Re: [xml] Reading CDATA

2008-09-26 Thread Hartmut Sbosny
Hi Daniel,

On Thursday, 25. September 2008 17:05, Daniel Veillard wrote:
 On Wed, Sep 17, 2008 at 06:48:37PM +0200, Hartmut Sbosny wrote:
  Hello,
  I am fresh to libxml. I want to read an xml file containing
  the part
  data
  ![CDATA[...]]
  /data
 
  Currently I use the xmlParseDoc() interface. My first try was to read the
  data node string via xmlNodeListGetString(). This returns something

   You just can't using that API.

Is this a principle limitation of the xmlParseDoc API or only an 
accidental API lack? Which API is suitable to read cdata? Sorry for my
stumbling asking, I am new to libxml and xml.



   From there I could in princple extract the pure ... content by
   subtracting the trailing white space node string, but it seems to me
   there should exist an easier libxml way to read the CDATA content?

 wei:~/XML - cat tst.xml
 data
![CDATA[...]]
 /data

 wei:~/XML - xmllint --debug tst.xml
 DOCUMENT
 version=1.0
 URL=tst.xml
 standalone=true
   ELEMENT data
 TEXT compact
   content=
 CDATA_SECTION
   content=...
 TEXT compact
   content=
 wei:~/XML -

  navigate in the tree and grab the data as content- from the second
 child of your containing element

I probably miss the point. Do you mean I should use the command line 
tool 'xmllint'? This would be rather inconvinient for me. I have ready (more 
or less) a C program - using the xmlParseDoc API - which reads an XML file, 
where the cdata content is only a small piece of. As a workaround to get 
the cdata I evaluate currently the three strings
___...___
...___
___
(in the said symbolic meaning) which I get when I parse through the 
children of the data element using xmlNodeListGetString(). Is that 
something I should not rely on?

Thanks for your response
Hartmut
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Reading CDATA

2008-09-26 Thread Daniel Veillard
On Fri, Sep 26, 2008 at 10:31:36AM +0200, Hartmut Sbosny wrote:
 Hi Daniel,
 
 On Thursday, 25. September 2008 17:05, Daniel Veillard wrote:
  On Wed, Sep 17, 2008 at 06:48:37PM +0200, Hartmut Sbosny wrote:
   Hello,
   I am fresh to libxml. I want to read an xml file containing
   the part
 data
 ![CDATA[...]]
 /data
  
   Currently I use the xmlParseDoc() interface. My first try was to read the
   data node string via xmlNodeListGetString(). This returns something
 
You just can't using that API.
 
 Is this a principle limitation of the xmlParseDoc API or only an 
 accidental API lack? Which API is suitable to read cdata? Sorry for my
 stumbling asking, I am new to libxml and xml.

  No API lack. You use an API to dump the content of a LIST of node
when you want the content of a SINGLE node.
  There is a zillion ways to get it like accessing directly the
node-content pointer  or using the API getting the content of 
a single node like xmlNodeGetContent()

  wei:~/XML - xmllint --debug tst.xml
  DOCUMENT
  version=1.0
  URL=tst.xml
  standalone=true
ELEMENT data
  TEXT compact
content=
  CDATA_SECTION
content=...
  TEXT compact
content=
  wei:~/XML -
 
   navigate in the tree and grab the data as content- from the second
  child of your containing element
 
 I probably miss the point. Do you mean I should use the command line 
 tool 'xmllint'?

  Hum, no I just tried to get you to understadn that the data model is
a tree and you need to walk that tree ... and xmllint --debug is a
convenient way to see this tree.

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] XQuery

2008-09-26 Thread Liam R E Quin
On Thu, 2008-09-25 at 19:06 +0200, Daniel Veillard wrote:
[...]
 XQuery doesn't make that much sense when playing with a single document,

There are several implementations of XQuery that work on a single
document and that people say are useful.


 my feeling is that it's more fit for a set/database of documents,
 i.e. libxml2 might be used to implement an XQuery engine on top of
 some database but just for within libxml2 it's not worth it.

As it stands today I don't know how good a fit libxml would be --
for XPath 2, nodes are typed, so you'd wanting to make some changes.
Certainly you could use libxml's XMLReader API to build data model
instances, though, and/or tie in to the W3C XML Schema validation.

dbxml (which uses Quilla) is a viable alternative for many people,
and the FLWOR Foundation is funding another, both in C++.  If you
are doing work in KDE, there's also an implementation from TrollTech
(Nokia now).

Liam


-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Reading CDATA

2008-09-26 Thread Hartmut Sbosny
On Fri, 26. September 2008 11:15, Daniel Veillard wrote:
 On Fri, Sep 26, 2008 at 10:31:36AM +0200, Hartmut Sbosny wrote:
  Hi Daniel,
 
  On Thursday, 25. September 2008 17:05, Daniel Veillard wrote:
   On Wed, Sep 17, 2008 at 06:48:37PM +0200, Hartmut Sbosny wrote:
Hello,
I am fresh to libxml. I want to read an xml file containing
the part
data
![CDATA[...]]
/data
   
Currently I use the xmlParseDoc() interface. My first try was to read
the data node string via xmlNodeListGetString(). This returns
something
  
 You just can't using that API.
 
  Is this a principle limitation of the xmlParseDoc API or only an
  accidental API lack? Which API is suitable to read cdata? Sorry for my
  stumbling asking, I am new to libxml and xml.

   No API lack. You use an API to dump the content of a LIST of node
 when you want the content of a SINGLE node.
   There is a zillion ways to get it like accessing directly the
 node-content pointer  or using the API getting the content of
 a single node like xmlNodeGetContent()

Ok, now I understand (I related You just can't using that API to 
xmlParseDoc(), not to xmlNodeListGetString()).

Many thanks
Hartmut
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] UTF-8 decoding bug in HTML parser

2008-09-26 Thread Michael Day

Hi Daniel,


  Reusing the XML code for this seems to work fine for em and the
regression test, but you have probably a more extensive HTML test
suite than me ;-) so raise the problem if there is a regression !
Will commit to SVN with the test case,


Thanks, I'll check it out. I think this greatly helps the usability of 
libxml2 for parsing HTML documents.


Cheers,

Michael

--
Print XML with Prince!
http://www.princexml.com
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] UTF-8 decoding bug in HTML parser

2008-09-26 Thread Michael Day

Hi Daniel,


  Reusing the XML code for this seems to work fine for em and the
regression test, but you have probably a more extensive HTML test
suite than me ;-) so raise the problem if there is a regression !


Actually, I just remembered one more issue: null bytes in HTML documents 
terminate the parser, with no error or warning messages. See the 
attached test document, which has two paragraphs separated by a null.


Best regards,

Michael

--
Print XML with Prince!
http://www.princexml.com
Hello, world!

This will not appear.
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] UTF-8 decoding bug in HTML parser

2008-09-26 Thread Daniel Veillard
On Fri, Sep 26, 2008 at 08:24:33PM +1000, Michael Day wrote:
 Hi Daniel,

   Reusing the XML code for this seems to work fine for em and the
 regression test, but you have probably a more extensive HTML test
 suite than me ;-) so raise the problem if there is a regression !
 Will commit to SVN with the test case,

 Thanks, I'll check it out. I think this greatly helps the usability of  
 libxml2 for parsing HTML documents.

  the patch doesn't work for the push parser though and if i add it to
push lot of things breaks so it's not final ...

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] UTF-8 decoding bug in HTML parser

2008-09-26 Thread Daniel Veillard
On Fri, Sep 26, 2008 at 08:29:44PM +1000, Michael Day wrote:
 Hi Daniel,

   Reusing the XML code for this seems to work fine for em and the
 regression test, but you have probably a more extensive HTML test
 suite than me ;-) so raise the problem if there is a regression !

 Actually, I just remembered one more issue: null bytes in HTML documents  
 terminate the parser, with no error or warning messages. See the  
 attached test document, which has two paragraphs separated by a null.

  that's gonna be harder to handle, the zero is used in places to
indicate the end of the input buffer... I don't expect something trivial
there.

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] libxml2-2.7.1, solaris 8, and xmlDictComputeBigKey

2008-09-26 Thread Andrew W. Nosenko
On Thu, Sep 25, 2008 at 6:43 PM, Daniel Veillard [EMAIL PROTECTED] wrote:
 On Tue, Sep 02, 2008 at 11:09:38AM -0400, Matt Goebel wrote:

 Hi,

   The changes around line 266 in dict.c which relate to
 xmlDictComputeBigKey, at least on solaris, require the inclusion
 of sys/int_types.h to pick up the defintion of uint8_t and uint16_t.

  hum, isn't there a more common header allowing to get those included ?
 Seems stdint.h should allow this and we already have

 #ifdef HAVE_STDINT_H
 #include stdint.h
 #elif defined(WIN32)
 typedef unsigned __int32 uint32_t;
 #endif

  does solaris really not have stdint.h ?
 I can't directly include sys/int_types.h , adding autodetect in
 configure.in should be possible but i would prefer to receive a tested
 patch in that case.

Sorry for blind shot, but whether Solaris's inttypes.h doesn't
includes int_types.h?

If yes (includes), then we can just include inttypes.h instead of
stdint.h or just both (after checking for existence by autoconf, of
course) for paranoia reasons.

From the Autoconf Manual:
(http://www.gnu.org/software/autoconf/manual/html_node/Header-Portability.html)

inttypes.h vs. stdint.h
The C99 standard says that inttypes.h includes stdint.h, so
there's no need to include stdint.h separately in a standard
environment. Some implementations have inttypes.h but not
stdint.h (e.g., Solaris 7), but we don't know of any
implementation that has stdint.h but not inttypes.h.

-- 
Andrew W. Nosenko [EMAIL PROTECTED]
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] libxml2 2.7.1 breaks XML serialisation of HTML trees

2008-09-26 Thread Martin (gzlist)
On 25/09/2008, Daniel Veillard [EMAIL PROTECTED] wrote:

   Stephan, Martin,

  could you check the enclosed patch ? I'm commiting it to SVN head too
  but it's probably easier to review that way.

I built trunk and had a play around, this handles my use case, thanks!

A couple of concerns, first it's not *totally* clear which options
should be used together. Should at least be documented, and maybe
clearly defined in the code as well.

The patch has a bunch of additions like this:
 xmlSaveCtxtInit(ctxt);
+ctxt.options |= XML_SAVE_AS_XML;
Move that common case into xmlSaveCtxtInit and overwrite after in
xmlNewSaveCtxt for the exception of wanting to save as html?
Or put logic in xmlNewSaveCtxt to make the set of options on the ctxt sane?
Or could the format parameter in those functions be safely upgraded to
full options - the signature is the same and XML_SAVE_FORMAT is 1
anyway, but I guess if people have been passing some other non-zero
value for formatting that'd break compatibility.

Finally, having XML_SAVE_AS_HTML makes it seem like you could save any
xml-flavoured-html document in non-xml-flavoured form, but that's not
quite the case. One thing I found was that the overloading of
XML_CDATA_SECTION_NODE to also be HTML_PRESERVE_NODE means the
contents get output raw, see HTMLtree.c lines 838-843 in
htmlNodeDumpFormatOutput.

  Basically it adds 3 parsing options, and for the old entry points
  xmlDump* not xmlSave based it forces the XML_SAVE_AS_XML bypassing
  the doc type in case of HTML documents. that should fix Stephan problem
  and also provide ways to do things with xmlSave when available.
  For the 'problem' of the added meta an XML_SAVE_IMMUTABLE option could
  be added that sounds more generic, but i'm not adding this in the patch
  to not complicate things.

A don't-fiddle-with-the-tree option sounds like a possibility, though
I do find the duplication of xml:lang to lang useful.

  I hope i didn't miss any old entry point which behaviour was modified in
  2.7.1, and not missing places where the new flags should be checked too,

This I haven't thoroughly tested, however.

Thanks for working on this,

Martin
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] libxml - source file list for cross compilation

2008-09-26 Thread Prashant R
I need to specify the sources that I need to build a static library with a
tiny footprint.


I need the following
xmlparser ( non file based , by creating parser context and then using
xmlparseChunk)
DOM
C14n
xmlwriter ( ability to only write data to a  buffer )


I do not need
SAX
SAX2
File IO ( as my environment has a different file system)
exceptions
debug support
xpath
xpointer
xinclude
regex
xtree

My goal is to get a library less than 500kb. Is this achievable with the
features I need ?

I would be great if I can get a list of sources (.c files) that would be
needed.
Its been really hard to make a minimal build taking into account the
dependencies and with the fact that i cannot use ./configure

Thanks in advance
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml