> I just wanted to know how much space in bytes the Latin-1 > characters such as the german umlaut characters take up in > UTF-8 encoding. Is it still just one byte or does it now > require 2 bytes? U+0000 up to U+007F take 1 byte (ASCII) U+0080 up to U+07FF take 2 bytes (Latin-1, Latin extended, combining diacritics, phonetics, greek, cyrillic, hebrew, arabic, syriac, and some more scripts - this is very little expansion especialy for laguages which use only few non-ASCII characters like swedish or german but expensive for greek or arabic or so) U+0800 up to U+FFFD take 3 bytes (hangul, cjk... not to expensive but significant) U+10000 up to U+10FFFD take 4 bytes (this is all the rest - take almoust everywhere 4 bytes, so this is no significant expansion).
If space is a concern, use SCSU - this shorter and has the additional advantage of beeing very much better compressable by zip or comparable algorithms. -- Dominikus Scherkl [EMAIL PROTECTED]