Re: [xml] A few question about parsing content

Daniel Veillard Thu, 03 Apr 2008 04:58:29 -0700

On Mon, Mar 31, 2008 at 09:45:49AM +0100, Julien Chaffraix wrote:
> Hi everyone,
> 
> I have an application that has to parse a "content" ( ::= (element |
> CharData | Reference | CDSect | PI | Comment)* as specified in the
> libxml documentation).
> Currently we are using xmlParseBalancedChunkMemory to parse it but it
> has induced code duplication (mainly due to the fact that we cannot
> tune the behavior
> with a xmlParserCtxt).
> I am trying to find a replacement for that API that should match the
> behaviour of xmlParseBalancedChunkMemory (we do not provide xmlDocPtr
> and xmlNodePtr as we build the representation ourselves using SAX2
> callbacks).
> Looking at the documentation, I found 3 candidates:
> - xmlParseBalancedChunkMemoryRecover
> - xmlParseInNodeContext
> - xmlParseContext
> 
> First, have I found all the candidates? (I am quite new to libxml so
> it is likely that I have missed some)


  That looks right to me.

> Then, is there a way to choose between them so that I have a behavior
> as close to xmlParseBalancedChunkMemory's as possible by providing a
> well-crafted xmlParserCtxt (a pointer about which type to use / how to
> initialize it would also be appreciated)?

  The problem is that what you are trying to do is not specified in the
spec as a normal parsing for XML, all the spec defines is how to parse
a document, not a subset. Since basically the spec is there for interopera-
bility there is a good reason to try to force this, I consider this is normal
except maybe for applications like editors. The fact that you use SAX
make you request look a bit suspicious actually, your application seems
to try to do something which is not interoperable, and not surprizing
it's harder to do with existing APIs...
  The only other thing I could think of, would be for you to set up 
a complete parser context and call xmlParseContent(), then do the parser
clanup in the end. It's really low level, requires more knowledge of the
parser internals, but I guess it's the price to pay for an a priori 
non-conformant behaviour.
  There are many things which are contextual when parsing an XML fragment
and you will have to recreate that context or you won't parse things
properly (e.g. namespace).

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] A few question about parsing content

Reply via email to