Daniel, As a fellow competant programmer, I exhausted all combinations of trying to make something work before coming here for help. I have also read and re-read the documentation page a few times and am not able to get anywhere therefore I came here for help and suggestion. As such, I came to this list for help, not to be called stupid.
Also, obviously my problem was not properly read and understood before it was answered. The byte sequence for "Ç" that would appear in an xml or html page is "Ç" as I stated in my first email. I understand that all strings are internally encoded as UTF-8. But what I want to achieve is that, once I retrieve the UTF-8 encoded string into a C variable, how can I convert the UTF-8 encoded squence "#C3#87" back to the corresponding "Ç" character so that other part of my application can use this character instead of the UTF-8 sequence ? As I said in my original email, I ran xmllint on the xml file and it was able to output "Ç" properly on my screen, NOT the UTF-8 encoded string. So there must be something that I should be calling to do the conversion. Please, if you are not able to help, just say so, or just don't bother to reply. Regards, Fred -----Original Message----- From: Daniel Veillard [mailto:[EMAIL PROTECTED] Sent: Thursday, September 15, 2005 4:37 PM To: Fred Fung Cc: [email protected] Subject: Re: [xml] French character encoding problem On Thu, Sep 15, 2005 at 12:51:40PM -0400, Fred Fung wrote: > Daniel, > > Thanks for the prompt reply. > > I already tried "ISO-8859-1" (and just tried again after reading your reply) > and I still get the same result. yes that's normal. You could use any encoding you will get the same. > Already read the encoding.html page a few times. According to this > page, does that mean that by specifying encoding to be ISO-8859-1, one > can put "Ç" in the xml file ? What is "Ç" ? What byte sequence ? Corresponding to what unicode code point(s) ? > What about if they choose to generate Ç instead of the character ? > I actually just tried putting "Ç" in the xml file with encoding ISO-8859-1. > xmlNodeGetContent() still returns "Ã" instead. It returned the 2 bytes corresponding to that code point in the UTF-8 encoding. The fact that all strings are encoded in UTF-8 internally is written on that page. > Also, if xmllint is able to return the proper character, what am I > missing that's causing xmlNodeGetContent() not ? That all internal representation are kept in UTF-8. It is clear you did not understood that page. Make sure you understand it. "One of the core decisions was to force all documents to be converted to a default internal encoding, and that encoding to be UTF-8" There is a few pointers at the beginning of that page explaining more about encodings, code points and unicode and how they relate. As long as you won't be familiar with those you will continue to have troubles I'm afraid. Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
