Re: Sporadic Unicode revisited

Keld Jørn Simonsen Thu, 03 Oct 2002 11:17:37 -0700

On Thu, Oct 03, 2002 at 08:58:47AM -0700, Doug Ewell wrote:
> Kenneth Whistler <kenw at sybase dot com> wrote:
> 
> > Attempting to extend the system to Greek, Cyrillic, Hebrew, and Arabic
> > just (in my opinion) results in mnemonics that are harder to remember
> > than the character names, even. What is the real advantage of "s*",
> > "s=", "S+" and "s+" over "sigma", "es", "samekh" and "seen" for
> > occasional usage? You end up having to look up all those "mnemonics"
> > in a table anyway, if you actually want to use them.
> 
> I can see the advantage if you have extended text (not just an isolated
> letter).  "p=r=i=v=e=t=" or even "&p=&r=&i=&v=&e=&t=" is quite a bit
> easier to read than a sequence of vocalized letter names.
> 
> My problem with RFC 1345, one reason I never implemented a converter
> even though it was a temptation, involves the escape character &.
> U+0026, the real ampersand, is encoded as simply "&", but that conflicts
> with its use as an escape character.  So the sequence "B&O" (including
> the double quotation marks) is ambiguous; it could mean
> 
>     U+0022 U+0042 U+0026 U+004F U+0022
> 
> or
> 
>     U+0022 U+0042 U+0150


Well, you double the introducer & to represent itself, so the second
example is the correct interpretation.

> Another problem is that the system is frozen in time in June 1992.
> There is no provision to extend the repertoire of RFC 1345 symbols to
> match the growing repertoire of Unicode.  Even U+20AC EURO SIGN cannot
> be represented!  At the same time, though, there are several
> "additional" symbols, mapped to the Private Use Area (U+E000 through
> U+E028), for characters assigned in ISO 6937 and other standards, some
> of which were subsequently added to Unicode or were already there (e.g.
> "DUTCH GUILDER SIGN," a.k.a. U+0192 LATIN SMALL LETTER F WITH HOOK).

The system, but not the RFC, has been extended, eg by ISO/IEC TR 14652 .
You can always use Uxxxx or Uxxxxxxxx identifiers for 10646 chars.

Best regards
keld

Re: Sporadic Unicode revisited

Reply via email to