RE: [xml] French character encoding problem

Fred Fung Thu, 15 Sep 2005 14:28:14 -0700

Daniel,

As a fellow competant programmer, I exhausted all combinations of trying to 
make something work before coming here for help. I have also read and re-read 
the documentation page a few times and am not able to get anywhere therefore I 
came here for help and suggestion. As such, I came to this list for help, not 
to be called stupid.

Also, obviously my problem was not properly read and understood before it was 
answered.

The byte sequence for "Ç" that would appear in an xml or html page is "&#199;" 
as I stated in my first email.

I understand that all strings are internally encoded as UTF-8. But what I want 
to achieve is that, once I retrieve the UTF-8 encoded string into a C variable, 
how can I convert the UTF-8 encoded squence "#C3#87" back to the corresponding 
"Ç" character so that other part of my application can use this character 
instead of the UTF-8 sequence ?

As I said in my original email, I ran xmllint on the xml file and it was able 
to output "Ç" properly on my screen, NOT the UTF-8 encoded string. So there 
must be something that I should be calling to do the conversion.

Please, if you are not able to help, just say so, or just don't bother to reply.

Regards,

Fred

-----Original Message-----
From: Daniel Veillard [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 15, 2005 4:37 PM
To: Fred Fung
Cc: [email protected]
Subject: Re: [xml] French character encoding problem

On Thu, Sep 15, 2005 at 12:51:40PM -0400, Fred Fung wrote:
> Daniel,
> 
> Thanks for the prompt reply.
> 
> I already tried "ISO-8859-1" (and just tried again after reading your reply) 
> and I still get the same result.

  yes that's normal. You could use any encoding you will get the same.

> Already read the encoding.html page a few times. According to this 
> page, does that mean that by specifying encoding to be ISO-8859-1, one 
> can put "Ç" in the xml file ?

  What is "Ç" ? What byte sequence ? Corresponding to what unicode code 
point(s) ?

> What about if they choose to generate &#199; instead of the character ?
> I actually just tried putting "Ç" in the xml file with encoding ISO-8859-1.
> xmlNodeGetContent() still returns "Ã" instead.

  It returned the 2 bytes corresponding to that code point in the UTF-8 
encoding. The fact that all strings are encoded in UTF-8 internally is written 
on that page. 

> Also, if xmllint is able to return the proper character, what am I 
> missing that's causing xmlNodeGetContent() not ?

  That all internal representation are kept in UTF-8.
It is clear you did not understood that page. Make sure you understand it.

  "One of the core decisions was to force all documents to be converted to
   a default internal encoding, and that encoding to be UTF-8"

 There is a few pointers at the beginning of that page explaining more about 
encodings, code points and unicode and how they relate. As long as you won't be 
familiar with those you will continue to have troubles I'm afraid.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/ 
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

RE: [xml] French character encoding problem

Reply via email to