Hello:

My knowledge about encoding is very poor and you seem to know a lot abou this. could you explain a bit more what you have said. I have made the following:

This is the problematic sequence 11110011-01101110-00100000-01001101 (F3-6e-20-4d) if I follow the instructions that appaear in the question(What is UTF-8?) in the UTf-8 fAQ i obtain the following
011101110100000001101 instead 1EE80D 111101110100000001101(Have I made a mistake?) Following the utf-16 encoding from my result all works well. so to finalize who do you think that is the responsible for this strange situation the client for saying that the doc is utf-8 or the parser.


Regards,
Mario.



From: Pim Blokland <[EMAIL PROTECTED]>
To: Unicode mailing list <[EMAIL PROTECTED]>
Subject: Re: Problems  encoding the spanish o
Date: Mon, 17 Nov 2003 13:26:19 +0100

pepe pepe schreef:

>   We have the following sequence of characters "...ización Map.."
that is
> the same than "...ización Map..." that after suffering some
> transformations becomes to "...izaci&#56186;&56333;ap...."
> AS you can see the two characters 56186 and 56333 seem to
represent this
> sequences "ón M". Any idea?.

Yes, your input text obviously gets flagged as being in UTF-8
format, even if it is Latin-1 (or any codepage that has a ó at index
243).
Not only that, but the process making the mistake of thinking it is
UTF-8 also makes the mistake of not generating an error for
encountering malformed byte sequences, AND of outputting the result
as two 16-bit numbers instead of one 21-bit number.

If you take the byte sequence (hex) F3 6E 20 4D and treat it as
UTF-8 and don't care it's not valid, this maps to the value
(hex)1EE80D. Again, not caring this is not a valid codepoint,
turning this into UTF-16 would yield U+DB7A U+DC0D, which is what
you got in your output.

Pim Blokland




_________________________________________________________________
Dale rienda suelta a tu tiempo libre. Encuentra mil ideas para exprimir tu ocio con MSN Entretenimiento. http://entretenimiento.msn.es/





Reply via email to