<[EMAIL PROTECTED]> wrote : > Bertrand Mansion wrote: > >> As far as I understand, UTF-8 will read 8859-1 without problem but >> ISO-8859-1 will not be able to read UTF-8, unless everything in the UTF8 >> string uses only 8859-1 codes. > > You're wrong, I think. > > UTF-8 is a variable length encoding of character codes of the unicode > code page. Iso8869-1 is a definition of a code page, each character is > encoded in exactly one byte. > > Unicode itself is a code page with much more characters than iso8859-1. > > Unicode, iso8859-1 and ASCII code pages share following properties: > > a.) character codes 0 upto 127 in unicode are equal to ASCII codes. > b.) character codes 128 upto 255 in unicode are equal to the iso8859-1 > codes. > > Please note: A 'character code' is _not_ a byte! It's the number of the > position of that character in a code page. The code page in iso8859-1 is > only 8 bits wide and has 256 entries. The unicode code page is 21 bits > wide, and not all positions are assigned to characters. > > In iso8859-1 all 256 character codes are encoded using simply one byte. > The value of the byte is the character position in the code page. > > In UTF-8 character codes 0 upto 127 are encoded in one byte and > character codes above 127 are encoded in _two_ bytes! > > That means the byte value of encoded character codes 0 upto 127 are > equal in UTF-8 and iso8859-1, but character codes above 127 takes two > bytes in UTF-8 and one byte in iso8859-1. > > In iso8859-1 the byte value is always the character code. In UTF-8 this > is only true for character codes 0 upto 127. > > However, in UTF-8 (the unicode code page encoding) you can encode > character codes upto 31 bits wide, using 6 bytes.
Thanks for the clear explanations :) Does this mean that as long as I only use ASCII in an UTF8 compiled sqlite library, the db will be also usable with a ISO-8859-1 compiled version of the library, but if I use for instance accentuated characters, it won't be compatible anymore ? I am asking because I once created a 8859-1 db and it could be read and modified in the UTF8 version of the library. I haven't tested the other way though. What will happen if I update fields with accentuated characters in my application compiled with the UTF8 and then try to open the db with let's say PHP sqlite extension ? I'll try to see what happens. On the php site, they warn users: <quote> The default PHP distribution builds libsqlite in ISO-8859-1 encoding mode. However, this is a misnomer; rather than handling ISO-8859-1, it operates according to your current locale settings for string comparisons and sort ordering. So, rather than ISO-8859-1, you should think of it as being '8-bit' instead. </quote> I am not sure what this means ? <quote> It is not recommended that you use PHP in a web-server configuration with a version of the SQLite library compiled with UTF-8 support, since libsqlite will abort the process if it detects a problem with the UTF-8 encoding. </quote> So, it looks like it is recommended not to use UTF8. But how then can I deal with characters like the euro symbol ? I guess that I am stuck ? Bertrand Mansion Mamasam --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]