Jacob Lund wrote:

Ok! Let me see if I can explain myself - I am not an expert on this so
please correct me if I am wrong!

An UTF-8 representation of one character consists of at combination of
characters. Now JAVA is a Unicode language and this means that one character

...of bytes.


can represent "any" type of character in the world!

Almost. Java's characters have only 16 bit, so there is a class of Unicode characters that need to be represented as a sequence of two Java characters.


Basically UTF-8 only makes sense when working on an "old" 7 bit asci system
and you need to use characters not available in the given codepage.

UTF-8 always makes sense when you need backward compatibilty with ASCII.


Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
character, where Unicode always uses 32 bit characters (maybe it is 24 bit).

Unicode doesn't "represent" at all. Unicode is just a definition of code points.


*Encodings* represent Unicode characters as byte sequences, and UTF-8 and UTF-16 are some of the Unicode encoding.

> ...

Julian

--
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to