At 11:12 AM -0400 10/23/07, Michael B Allen wrote:
On 10/23/07, tedd <[EMAIL PROTECTED]> wrote:
At 7:21 PM -0400 10/21/07, John Campbell wrote:
>The first thing to understand about character encoding is the overlap
>between UTF-8 and 8859-1. Below is a sample
>a - lower case a (Same in 8859-1 & UTF-8)
>à - a acute (Available in 8859-1 & UTF8 but different values..)
>éí - Chinese character (Not in 8859-1, in UTF-8)
A small clarification -- it's not really overlap,
but rather UTF-8 is a super-set containing 8859-1
like both contain ASCII.
Well if you want to be pedantic about it, "overlap" is more accurate.
UTF-8 is a multibyte encoding of the Unicode charset. ISO-8859-1 is a
single byte encoding of the ISO-8859-1 charset. So yes, Unicode is a
superset of ISO-8859-1 but the UTF-8 encoding of values above 0x7f are
not the same.
Mike
You are free to call it what you want.
True, the code-points for the ISO-8859-1 charset
above 0x7F (the M$ spin) are not the same as
UTF-* et al, but the glyphs are still included in
UFT-8 regardless of encoding differences -- is
that not true?
If this is true, then the term "overlap" would be
less correct than "super-set" because the two
sets do not overlap with respect to all
code-points -- but the larger one still contain
all the glyphs that the smaller one does (for the
exception of Apple's spin on that set, which
included adding their logo).
That's the reason I'm free to call one a super-set of the the other.
I believe it's easier to explain char-sets and
code-points in terms of current Unicode standards
than it is to point out historical differences
that are diminishing in importance as more people
convert.
Cheers,
tedd
--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk
NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com
Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php