On Thu, Oct 03, 2002 at 08:58:47AM -0700, Doug Ewell wrote: > Kenneth Whistler <kenw at sybase dot com> wrote: > > > Attempting to extend the system to Greek, Cyrillic, Hebrew, and Arabic > > just (in my opinion) results in mnemonics that are harder to remember > > than the character names, even. What is the real advantage of "s*", > > "s=", "S+" and "s+" over "sigma", "es", "samekh" and "seen" for > > occasional usage? You end up having to look up all those "mnemonics" > > in a table anyway, if you actually want to use them. > > I can see the advantage if you have extended text (not just an isolated > letter). "p=r=i=v=e=t=" or even "&p=&r=&i=&v=&e=&t=" is quite a bit > easier to read than a sequence of vocalized letter names. > > My problem with RFC 1345, one reason I never implemented a converter > even though it was a temptation, involves the escape character &. > U+0026, the real ampersand, is encoded as simply "&", but that conflicts > with its use as an escape character. So the sequence "B&O" (including > the double quotation marks) is ambiguous; it could mean > > U+0022 U+0042 U+0026 U+004F U+0022 > > or > > U+0022 U+0042 U+0150
Well, you double the introducer & to represent itself, so the second example is the correct interpretation. > Another problem is that the system is frozen in time in June 1992. > There is no provision to extend the repertoire of RFC 1345 symbols to > match the growing repertoire of Unicode. Even U+20AC EURO SIGN cannot > be represented! At the same time, though, there are several > "additional" symbols, mapped to the Private Use Area (U+E000 through > U+E028), for characters assigned in ISO 6937 and other standards, some > of which were subsequently added to Unicode or were already there (e.g. > "DUTCH GUILDER SIGN," a.k.a. U+0192 LATIN SMALL LETTER F WITH HOOK). The system, but not the RFC, has been extended, eg by ISO/IEC TR 14652 . You can always use Uxxxx or Uxxxxxxxx identifiers for 10646 chars. Best regards keld