On Fri, 23 May 2008 13:59:56 -0400 John W Kennedy wrote: > On May 23, 2008, at 8:20 AM, Michael Adams wrote: > > > > This bit i am fairly hazy about: UTF-16 allows 256 * 256 or 65500+ > > characters and UTF-32 allows 256 * 256 * 256 * 256 characters and > > are International standards. > > Not precisely. UTF-16 allows 256 * 256 - 2048 + 1024 * 1024, or > 1,112,064 characters, 63,488 being two bytes, and 1,048,576 being four > > bytes. 1024 characters out of the 65,536 possible two-byte codes are > reserved to be used as the first half of a four-byte character, and > another 1024 as the second half. > > UTF-32 allows only the same 1,112,064 characters. UTF-32 is obviously > wasteful, and is not meant to be used except in cases where you want > to be able to find the nth character in a string without counting. > (You can do the same thing with UTF-16 if all the characters fit in > the base 63,488, which will usually be the case unless you're using > something rare, such as Egyptian hieroglyphics or abnormal Chinese > dialects.) > > UTF-8 also allows only the same 1,112,064 characters, in one, two, > three, or four bytes. UTF-8 normally takes less space than UTF-16 if > most of the characters are in US-ASCII, but tends to take more space > otherwise. >
Thanks for that. -- Michael All shall be well, and all shall be well, and all manner of things shall be well - Julian of Norwich 1342 - 1416 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]