[xml] libxml newbie question on htmlParseChunk function

Van H Tran Fri, 02 Jun 2006 01:07:33 -0700

Hi all,
My very first post in this mailing list :)

Ok, i'm trying to unhtmlize some text, using the SAX
model.


Here is how i initialize the parser

void unhtmlizeHandleCharacters(void *user_data, const
xmlChar * string,
                   int length)
{
   fprintf(stderr,"string = %s", (gchar *)string);
   //process string here...
}
void unhtmlize(text)
{
    sax_p = g_new0(htmlSAXHandler, 1);
    sax_p->characters = unhtmlizeHandleCharacters;
    ctxt =
    htmlCreatePushParserCtxt(sax_p, buffer, string,
strlen(string), "",
                 XML_CHAR_ENCODING_UTF8);
    htmlParseChunk(ctxt, string, 0, 1);
}    


What's interesting is, this works with 'normal' text.
However if
text = "abc < xyz"

Then i see in the debug in func handleCharacters that
it only takes "abc " as the string, everything after
this '<' character is omitted.

So my func unhtmlize("abc < xyz") gives "abc " as the
result. 

How can i over come this? Any reply much appreciated. 

Thanks in advance
TranVan Hoang,
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

[xml] libxml newbie question on htmlParseChunk function

Reply via email to