I imagine my problem is due to my own ignorance of how char encodings work and how libxml2 handles them, but I’m growing frustrated with my inability to figure it out so thought to beg advice from the list.
Given this small program: /* * author: Lucas Brasilino <brasil...@recife.pe.gov.br<mailto:brasil...@recife.pe.gov.br>> * copy: see Copyright for the status of this software * hacked up by Fred Smith to illustrate a problem I'm having. */ #include <stdio.h> #include <libxml/parser.h> #include <libxml/tree.h> int main(int argc, char **argv) { xmlDocPtr doc = NULL; /* document pointer */ xmlNodePtr root_node = NULL, node = NULL, node1 = NULL;/* node pointers */ xmlDtdPtr dtd = NULL; /* DTD pointer */ char buff[256]; int i, j; xmlChar * convstr; char tststr[40]; xmlNodePtr sub; LIBXML_TEST_VERSION; doc = xmlNewDoc(BAD_CAST "1.0"); snprintf (tststr, sizeof(tststr), "Test %c Test", 0xC9); convstr = xmlEncodeEntitiesReentrant (doc, (xmlChar *)tststr); if (convstr) { printf ("tststr: %s\n", tststr); printf ("convstr: %s\n", convstr); free (convstr); } xmlFreeDoc(doc); xmlCleanupParser(); xmlMemoryDump(); return(0); } I get this output: $ ./tree tststr: Test � Test convstr: Test ɠTest hexdump reveals it as: 000000: 73 74 73 74 72 3a 20 20 54 65 73 74 20 c9 20 54 ststr: Test . T 000010: 65 73 74 0a 63 6f 6e 76 73 74 72 3a 20 54 65 73 est.convstr: Tes 000020: 74 20 26 23 78 32 36 30 3b 54 65 73 74 0a t ɠTest. Now,… I’m puzzled by why the output from xmlEncodeEntitiesReentrant() seems clearly (to me) to be wrong. First of all, it has sucked up not only the 0xC9, but the character following it too, but just as bad when the app that should be receiving this actually gets it, it is unable to reconstruct the actual Unicode point that appeared in the original text (i.e., the 0xC9, which represents a capital E with acute accent). I’m sure I’m doing something wrong here, but I am unable to see it, so your advice will be appreciated. Thanks in advance! Fred Smith This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify the system manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml