Greetings.

I just started using the Libxml2 library for HTML parsing. One of the 
requirements is to parse multiple HTML fragments separately and
combine the fragments into a single HTML document at the end. However, the 
<html/>, <body/> tags get added to each fragment that is processed.

I was looking at the thread at 
http://mail.gnome.org/archives/xml/2010-January/msg00112.html and it seems like 
this is exactly the same issue I have. I thought adding the
HTML_PARSE_NOIMPLIED option would resolve the issue but that doesn't seem to 
work.. In fact, the htmlCtxtUseOption(...) function doesn't
recognize the HTML_PARSE_NOIMPLIED option. 

Here is part of the source code I've written. I'm using the latest LibXML2 
2.7.8 version. The following code is executed for
each HTML fragment that is processed. 

...
htmlParserCtxtPtr parser = htmlCreatePushParserCtxt(NULL, NULL, NULL,0, NULL, 
0);
int i = htmlCtxtUseOptions(parser, HTML_PARSE_RECOVER |HTML_PARSE_NOERROR | 
HTML_PARSE_NOWARNING | HTML_PARSE_NOIMPLIED);
printf("HTML CTXT %d\n",i); //prints 8192 which corresponds to 
HTML_PARSE_NOIMPLIED
htmlParseChunk(parser,  htmlFragment, strlen(htmlFragment), 0);
...
htmlNodeDump(buffer, doc, xmlDocGetRootElement(doc));; //Adds <html> and <body> 
tags for each fragment!

Any pointers or suggestions on how to work around this issue?

Thanks!
Stan
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to