Re: [users] Character Encodings

Michael Adams Fri, 23 May 2008 13:11:09 -0700

On Fri, 23 May 2008 13:59:56 -0400
John W Kennedy wrote:

> On May 23, 2008, at 8:20 AM, Michael Adams wrote:
> >
> > This bit i am fairly hazy about: UTF-16 allows 256 * 256 or 65500+
> > characters and UTF-32 allows 256 * 256 * 256 * 256 characters and
> > are International standards.
> 
> Not precisely. UTF-16 allows 256 * 256 - 2048 + 1024 * 1024, or  
> 1,112,064 characters, 63,488 being two bytes, and 1,048,576 being four
>  
> bytes. 1024 characters out of the 65,536 possible two-byte codes are  
> reserved to be used as the first half of a four-byte character, and  
> another 1024 as the second half.
> 
> UTF-32 allows only the same 1,112,064 characters. UTF-32 is obviously 
> wasteful, and is not meant to be used except in cases where you want  
> to be able to find the nth character in a string without counting.  
> (You can do the same thing with UTF-16 if all the characters fit in  
> the base 63,488, which will usually be the case unless you're using  
> something rare, such as Egyptian hieroglyphics or abnormal Chinese  
> dialects.)
> 
> UTF-8 also allows only the same 1,112,064 characters, in one, two,  
> three, or four bytes. UTF-8 normally takes less space than UTF-16 if  
> most of the characters are in US-ASCII, but tends to take more space  
> otherwise.
>


Thanks for that.

-- 
Michael

All shall be well, and all shall be well, and all manner of things shall
be well

 - Julian of Norwich 1342 - 1416

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [users] Character Encodings

Reply via email to