Well, if the point is to refer to characters that would require two or more code units in UTF-8, then _accurate_ expressions would be, "Unicode characters beyond the Basic Latin block" or "Unicode characters above U+007F".
Peter -----Original Message----- From: Steve Swales [mailto:st...@swales.us] Sent: Sunday, September 20, 2015 11:00 AM To: Phillips, Addison <addi...@lab126.com> Cc: Peter Constable <peter...@microsoft.com>; Sean Leonard <lists+unic...@seantek.com>; unicode@unicode.org Subject: Re: Concise term for non-ASCII Unicode characters Exactly. I think the reason that non-ASCII feels non-concise is that there is widespread confusion between ASCII and Latin-1/ISO 8859-1 (which in turn is widely confused with Windows-1252). -steve Sent from my iPhone > On Sep 20, 2015, at 10:05 AM, Phillips, Addison <addi...@lab126.com> wrote: > > I agree, although I note that sometimes the additional (redundant) > specificity of "non-7-bit-ASCII characters" is needed when talking to people > unclear on what "ASCII" means. > > Addison > >> -----Original Message----- >> From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Peter >> Constable >> Sent: Sunday, September 20, 2015 9:52 AM >> To: Sean Leonard; unicode@unicode.org >> Subject: RE: Concise term for non-ASCII Unicode characters >> >> You already have been using "non-ASCII Unicode", which is about as >> concise and sufficiently accurate as you'll get. There's no term >> specifically defined in any standard or conventionally used for this. >> >> >> Peter >> >> -----Original Message----- >> From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Sean >> Leonard >> Sent: Sunday, September 20, 2015 7:48 AM >> To: unicode@unicode.org >> Subject: Concise term for non-ASCII Unicode characters >> >> What is the most concise term for characters or code points outside >> of the US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to >> these as "extended characters" or "non-ASCII Unicode" but I do not >> find those terms precise. We are talking about the code points U+0080 >> - U+10FFFF. I suppose that this also refers to code points/scalar >> values that are not formally Unicode characters, such as U+FFFF. >> Basically, I am looking for a concise term for values that would >> require multiple UTF-8 octets if encoded in UTF-8 (without referring to >> UTF-8 encoding specifically). >> "Non-ASCII" is not precise enough since character sets like Shift-JIS >> are non- ASCII. >> >> Also a citation to a relevant standard (whether Unicode or otherwise) >> would be helpful. >> >> The terms "supplementary character" and "supplementary code point" >> are defined in the Unicode standard, referring to characters or code >> points above U+FFFF. I am looking for something like those, but for >> characters or code points above U+007F. >> >> Thank you, >> >> Sean > >