Re: [nyphp-talk] Character set issues revisited

tedd Thu, 25 Oct 2007 07:50:50 -0700

At 11:12 AM -0400 10/23/07, Michael B Allen wrote:

On 10/23/07, tedd <[EMAIL PROTECTED]> wrote:

 At 7:21 PM -0400 10/21/07, John Campbell wrote:
 >The first thing to understand about character encoding is the overlap
 >between UTF-8 and 8859-1.  Below is a sample
 >a - lower case a (Same in 8859-1 & UTF-8)
 >à - a acute (Available in 8859-1 & UTF8 but different values..)
 >éí - Chinese character (Not in 8859-1, in UTF-8)


 A small clarification -- it's not really overlap,
 but rather UTF-8 is a super-set containing 8859-1
 like both contain ASCII.


Well if you want to be pedantic about it, "overlap" is more accurate.
UTF-8 is a multibyte encoding of the Unicode charset. ISO-8859-1 is a
single byte encoding of the ISO-8859-1 charset. So yes, Unicode is a
superset of ISO-8859-1 but the UTF-8 encoding of values above 0x7f are
not the same.

Mike



You are free to call it what you want.

True, the code-points for the ISO-8859-1 charsetabove 0x7F (the M$ spin) are not the same asUTF-* et al, but the glyphs are still included inUFT-8 regardless of encoding differences -- isthat not true?

If this is true, then the term "overlap" would beless correct than "super-set" because the twosets do not overlap with respect to allcode-points -- but the larger one still containall the glyphs that the smaller one does (for theexception of Apple's spin on that set, whichincluded adding their logo).


That's the reason I'm free to call one a super-set of the the other.

I believe it's easier to explain char-sets andcode-points in terms of current Unicode standardsthan it is to point out historical differencesthat are diminishing in importance as more peopleconvert.


Cheers,

tedd
--
-------
http://sperling.com  http://ancientstones.com  http://earthstones.com
_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com

Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php

Re: [nyphp-talk] Character set issues revisited

Reply via email to