Re: Smiles, faces, etc

2002-02-16 Thread Michael \(michka\) Kaplan
From: "Christopher J Fynn" <[EMAIL PROTECTED]> A few pedantic corrections follow. > IMO attention needs to be paid to making > sure all these characters are encoded before we start > bothering with Klingon, smileys, & etc. Klingon was rejected. A proposal for smileys would be a welcome chuckler

RE: Unicode and end users

2002-02-16 Thread Yves Arrouye
> If "foo" is a US-ASCII string, "grep foo file" will work fine with any > US-ASCII-superset charset for which non-ASCII characters do not use > bytes < 0x80, including the hypothetical one I described, with no > possibility of a false match. However "grep fóó file" will work only > if the current

Re: Smiles, faces, etc

2002-02-16 Thread Asmus Freytag
Whether or not they would get support to be encoded is almost irrelevant as long as no-one comes forward and makes a formal proposal with solid background information. Only then can this issue be settled where it matters: in the UTC. Discussions on open lists like this, unless accompanied by f

Re: Unicode and end users

2002-02-16 Thread Asmus Freytag
At 12:37 PM 2/16/02 -0800, Doug Ewell wrote: >Why would anyone, faced with a UTF-8 file that contains invalid >sequences, want to retain the invalid sequences, much less convert the >file to another encoding form that either (a) preserves the invalid >sequences or (b) leaves a marker showing where

SV: Analysis of ISO 639 and mappings to SIL Ethnologue

2002-02-16 Thread Audun H. Lona
The Norwegian and Sami language pages on this web site are unfortunately so full of errors that they should be removed or corrected immediately in order to avoid misleading information to be spread. An example on http://www.ethnologue.com/show_language.asp?code=LPR Dialects RUIJA, TORNE, SEA LAP

Re: Unicode and end users

2002-02-16 Thread Doug Ewell
David Hopwood <[EMAIL PROTECTED]> wrote: > [I've thought about this a bit more, and I'm now convinced that it's > useful to have a separate, standardised code for this - say > U+FDEF ILL-FORMED INPUT MARKER. (Can noncharacters have names?) Nope. They're noncharacters. They do not exist; they n

Re: Unicode and end users

2002-02-16 Thread David Starner
On Fri, Feb 15, 2002 at 02:57:46PM +, David Hopwood wrote: > Not having to add a few more lines of code to grep and sed is a good > trade-off for a 50% penalty in encoding efficiency for Indic & Southeast > Asian scripts, Katakana, Hiragana and a few others? I don't think so. Not complicating

Re: Unicode and end users

2002-02-16 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- David Starner wrote: > On Thu, Feb 14, 2002 at 03:15:24PM +, David Hopwood wrote: > > [re: a hypothetical charset that has almost all the properties of UTF-8] > > > > (The exception is that naïve substring searching could find a > > match starting part-way t

About Unicode encodings

2002-02-16 Thread Miguel Angel Suárez
In these notes I want to discuss the notion of "encoding" and introduce a series of concepts that I find useful. My intention is to try to identify abstract properties of encodings and to be able to classify encodings according these properties. In its most general terms, and encoding is a binary

Re: Smiles, faces, etc

2002-02-16 Thread Patrick Andries
Christopher J Fynn wrote: > >Patrick, > >There are whole scripts for contemporary languages which >are as yet unencoded in the Unicode Standard and some >punctuation and other chararacters missing from already >encoded scripts. IMO attention needs to be paid to making >sure all these charac

Re: Smiles, faces, etc

2002-02-16 Thread Lukas Pietsch
"Falkor" wrote: > I was thinking more that this would allow modern software to translate a > lower-ASCII three-character sequence into a single unicode emoticon > character that would be displayed properly regardless of OS and software, > also alleviating the need for such developers to create pr

RE: Smiles, faces, etc

2002-02-16 Thread Christopher J Fynn
Patrick Andries wrote: << I wonder sometimes if the largest obstacle in the encoding of smileys as characters is not the "universal" normalization process itself. Had they been invented a few decades ago and encoded "locally" in some kind of popular font/encoding (the Netscape font for