Patrick Andries wrote:
I wonder sometimes if the largest obstacle in the encoding
of smileys as characters is not the universal normalization
process itself. Had they been invented a few decades ago and
encoded locally in some kind of popular font/encoding (the
Netscape font for
Falkor wrote:
I was thinking more that this would allow modern software to translate
a
lower-ASCII three-character sequence into a single unicode emoticon
character that would be displayed properly regardless of OS and
software,
also alleviating the need for such developers to create
Christopher J Fynn wrote:
Patrick,
There are whole scripts for contemporary languages which
are as yet unencoded in the Unicode Standard and some
punctuation and other chararacters missing from already
encoded scripts. IMO attention needs to be paid to making
sure all these characters
In these notes I want to discuss the notion of encoding and introduce a
series of concepts that I find useful. My intention is to try to identify
abstract properties of encodings and to be able to classify encodings
according these
properties.
In its most general terms, and encoding is a binary
-BEGIN PGP SIGNED MESSAGE-
David Starner wrote:
On Thu, Feb 14, 2002 at 03:15:24PM +, David Hopwood wrote:
[re: a hypothetical charset that has almost all the properties of UTF-8]
(The exception is that naïve substring searching could find a
match starting part-way through a
On Fri, Feb 15, 2002 at 02:57:46PM +, David Hopwood wrote:
Not having to add a few more lines of code to grep and sed is a good
trade-off for a 50% penalty in encoding efficiency for Indic Southeast
Asian scripts, Katakana, Hiragana and a few others? I don't think so.
Not complicating
David Hopwood [EMAIL PROTECTED] wrote:
[I've thought about this a bit more, and I'm now convinced that it's
useful to have a separate, standardised code for this - say
U+FDEF ILL-FORMED INPUT MARKER. (Can noncharacters have names?)
Nope. They're noncharacters. They do not exist; they never
Whether or not they would get support to be encoded is almost irrelevant as
long as no-one comes forward and makes a formal proposal with solid
background information. Only then can this issue be settled where it
matters: in the UTC.
Discussions on open lists like this, unless accompanied by
At 12:37 PM 2/16/02 -0800, Doug Ewell wrote:
Why would anyone, faced with a UTF-8 file that contains invalid
sequences, want to retain the invalid sequences, much less convert the
file to another encoding form that either (a) preserves the invalid
sequences or (b) leaves a marker showing where
If foo is a US-ASCII string, grep foo file will work fine with any
US-ASCII-superset charset for which non-ASCII characters do not use
bytes 0x80, including the hypothetical one I described, with no
possibility of a false match. However grep fóó file will work only
if the current shell
10 matches
Mail list logo