RE: Smiles, faces, etc

2002-02-16 Thread Christopher J Fynn
Patrick Andries wrote: I wonder sometimes if the largest obstacle in the encoding of smileys as characters is not the universal normalization process itself. Had they been invented a few decades ago and encoded locally in some kind of popular font/encoding (the Netscape font for

Re: Smiles, faces, etc

2002-02-16 Thread Lukas Pietsch
Falkor wrote: I was thinking more that this would allow modern software to translate a lower-ASCII three-character sequence into a single unicode emoticon character that would be displayed properly regardless of OS and software, also alleviating the need for such developers to create

Re: Smiles, faces, etc

2002-02-16 Thread Patrick Andries
Christopher J Fynn wrote: Patrick, There are whole scripts for contemporary languages which are as yet unencoded in the Unicode Standard and some punctuation and other chararacters missing from already encoded scripts. IMO attention needs to be paid to making sure all these characters

About Unicode encodings

2002-02-16 Thread Miguel Angel Surez
In these notes I want to discuss the notion of encoding and introduce a series of concepts that I find useful. My intention is to try to identify abstract properties of encodings and to be able to classify encodings according these properties. In its most general terms, and encoding is a binary

Re: Unicode and end users

2002-02-16 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- David Starner wrote: On Thu, Feb 14, 2002 at 03:15:24PM +, David Hopwood wrote: [re: a hypothetical charset that has almost all the properties of UTF-8] (The exception is that naïve substring searching could find a match starting part-way through a

Re: Unicode and end users

2002-02-16 Thread David Starner
On Fri, Feb 15, 2002 at 02:57:46PM +, David Hopwood wrote: Not having to add a few more lines of code to grep and sed is a good trade-off for a 50% penalty in encoding efficiency for Indic Southeast Asian scripts, Katakana, Hiragana and a few others? I don't think so. Not complicating

Re: Unicode and end users

2002-02-16 Thread Doug Ewell
David Hopwood [EMAIL PROTECTED] wrote: [I've thought about this a bit more, and I'm now convinced that it's useful to have a separate, standardised code for this - say U+FDEF ILL-FORMED INPUT MARKER. (Can noncharacters have names?) Nope. They're noncharacters. They do not exist; they never

Re: Smiles, faces, etc

2002-02-16 Thread Asmus Freytag
Whether or not they would get support to be encoded is almost irrelevant as long as no-one comes forward and makes a formal proposal with solid background information. Only then can this issue be settled where it matters: in the UTC. Discussions on open lists like this, unless accompanied by

Re: Unicode and end users

2002-02-16 Thread Asmus Freytag
At 12:37 PM 2/16/02 -0800, Doug Ewell wrote: Why would anyone, faced with a UTF-8 file that contains invalid sequences, want to retain the invalid sequences, much less convert the file to another encoding form that either (a) preserves the invalid sequences or (b) leaves a marker showing where

RE: Unicode and end users

2002-02-16 Thread Yves Arrouye
If foo is a US-ASCII string, grep foo file will work fine with any US-ASCII-superset charset for which non-ASCII characters do not use bytes 0x80, including the hypothetical one I described, with no possibility of a false match. However grep fóó file will work only if the current shell