Hi all,
My very first post in this mailing list :)
Ok, i'm trying to unhtmlize some text, using the SAX
model.
Here is how i initialize the parser
void unhtmlizeHandleCharacters(void *user_data, const
xmlChar * string,
int length)
{
fprintf(stderr,"string = %s", (gchar *)string);
//process string here...
}
void unhtmlize(text)
{
sax_p = g_new0(htmlSAXHandler, 1);
sax_p->characters = unhtmlizeHandleCharacters;
ctxt =
htmlCreatePushParserCtxt(sax_p, buffer, string,
strlen(string), "",
XML_CHAR_ENCODING_UTF8);
htmlParseChunk(ctxt, string, 0, 1);
}
What's interesting is, this works with 'normal' text.
However if
text = "abc < xyz"
Then i see in the debug in func handleCharacters that
it only takes "abc " as the string, everything after
this '<' character is omitted.
So my func unhtmlize("abc < xyz") gives "abc " as the
result.
How can i over come this? Any reply much appreciated.
Thanks in advance
TranVan Hoang,
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml