As far as I understand, UTF-8 will read 8859-1 without problem but ISO-8859-1 will not be able to read UTF-8, unless everything in the UTF8 string uses only 8859-1 codes.
You're wrong, I think.
UTF-8 is a variable length encoding of character codes of the unicode code page. Iso8869-1 is a definition of a code page, each character is encoded in exactly one byte.
Unicode itself is a code page with much more characters than iso8859-1.
Unicode, iso8859-1 and ASCII code pages share following properties:
a.) character codes 0 upto 127 in unicode are equal to ASCII codes.
b.) character codes 128 upto 255 in unicode are equal to the iso8859-1 codes.
Please note: A 'character code' is _not_ a byte! It's the number of the position of that character in a code page. The code page in iso8859-1 is only 8 bits wide and has 256 entries. The unicode code page is 21 bits wide, and not all positions are assigned to characters.
In iso8859-1 all 256 character codes are encoded using simply one byte. The value of the byte is the character position in the code page.
In UTF-8 character codes 0 upto 127 are encoded in one byte and character codes above 127 are encoded in _two_ bytes!
That means the byte value of encoded character codes 0 upto 127 are equal in UTF-8 and iso8859-1, but character codes above 127 takes two bytes in UTF-8 and one byte in iso8859-1.
In iso8859-1 the byte value is always the character code. In UTF-8 this is only true for character codes 0 upto 127.
However, in UTF-8 (the unicode code page encoding) you can encode character codes upto 31 bits wide, using 6 bytes.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]