From: "Christopher J Fynn" <[EMAIL PROTECTED]>
A few pedantic corrections follow.
> IMO attention needs to be paid to making
> sure all these characters are encoded before we start
> bothering with Klingon, smileys, & etc.
Klingon was rejected. A proposal for smileys would be a welcome chuckler
> If "foo" is a US-ASCII string, "grep foo file" will work fine with any
> US-ASCII-superset charset for which non-ASCII characters do not use
> bytes < 0x80, including the hypothetical one I described, with no
> possibility of a false match. However "grep fóó file" will work only
> if the current
Whether or not they would get support to be encoded is almost irrelevant as
long as no-one comes forward and makes a formal proposal with solid
background information. Only then can this issue be settled where it
matters: in the UTC.
Discussions on open lists like this, unless accompanied by f
At 12:37 PM 2/16/02 -0800, Doug Ewell wrote:
>Why would anyone, faced with a UTF-8 file that contains invalid
>sequences, want to retain the invalid sequences, much less convert the
>file to another encoding form that either (a) preserves the invalid
>sequences or (b) leaves a marker showing where
The Norwegian and Sami language pages on this web site are unfortunately
so full of errors that they should be removed or corrected immediately
in order to avoid misleading information to be spread.
An example on http://www.ethnologue.com/show_language.asp?code=LPR
Dialects RUIJA, TORNE, SEA LAP
David Hopwood <[EMAIL PROTECTED]> wrote:
> [I've thought about this a bit more, and I'm now convinced that it's
> useful to have a separate, standardised code for this - say
> U+FDEF ILL-FORMED INPUT MARKER. (Can noncharacters have names?)
Nope. They're noncharacters. They do not exist; they n
On Fri, Feb 15, 2002 at 02:57:46PM +, David Hopwood wrote:
> Not having to add a few more lines of code to grep and sed is a good
> trade-off for a 50% penalty in encoding efficiency for Indic & Southeast
> Asian scripts, Katakana, Hiragana and a few others? I don't think so.
Not complicating
-BEGIN PGP SIGNED MESSAGE-
David Starner wrote:
> On Thu, Feb 14, 2002 at 03:15:24PM +, David Hopwood wrote:
> > [re: a hypothetical charset that has almost all the properties of UTF-8]
> >
> > (The exception is that naïve substring searching could find a
> > match starting part-way t
In these notes I want to discuss the notion of "encoding" and introduce a
series of concepts that I find useful. My intention is to try to identify
abstract properties of encodings and to be able to classify encodings
according these
properties.
In its most general terms, and encoding is a binary
Christopher J Fynn wrote:
>
>Patrick,
>
>There are whole scripts for contemporary languages which
>are as yet unencoded in the Unicode Standard and some
>punctuation and other chararacters missing from already
>encoded scripts. IMO attention needs to be paid to making
>sure all these charac
"Falkor" wrote:
> I was thinking more that this would allow modern software to translate
a
> lower-ASCII three-character sequence into a single unicode emoticon
> character that would be displayed properly regardless of OS and
software,
> also alleviating the need for such developers to create pr
Patrick Andries wrote:
<< I wonder sometimes if the largest obstacle in the encoding
of smileys as characters is not the "universal" normalization
process itself. Had they been invented a few decades ago and
encoded "locally" in some kind of popular font/encoding (the
Netscape font for
12 matches
Mail list logo