Doug Ewell wrote:
> Yes, it's true.  Marco had sent me his UTF-Morse proposal just
> yesterday, along with a suggestion that I put together an 
> implementation for April Fool's Day.  And darned if I wasn't
> really going to do it.  As a JOKE.
> 
> But Marco, you need to check your invented sequences again.  
> The leading and trailing Morse code units for the
> (non-ASCII) multi-Morse characters conflict with some of the
> single-unit characters.  For example, U+002D -....- looks like
> a leading unit, and U+0023 .-.-.. looks like a trailing unit.

--- --- --- --- --- ..--.. ...

Sorry! Not only I use everybody's bandwidth for April fools in advance: I
also get all the details wrong!

I attempted to simplify the wording while translating in English, and I
messed everything up. So now I have to use more bandwidth to send a
corrected version.

> (It's only a JOKE, guys.  Take a breath.)

BTW I recalled that, time ago, the aficionados of faction UTF's on this list
decided to call their creations "XTF's", in order to minimize the
possibility of confusion with real UTF's.

So, everybody reading this message now or in the next years, please take
notice that XTF-Morse is *not* an UTF: just an aborted April fool! So please
don't knock at the Unicode Consortium asking for the last version of the
specs for sending Unicode in Morse!

_ Marco


======================================================================
XTF-Morse [*] - "Bringing Unicode in the telegraph age!"


----------------------------------------------------------------------
0. Terminology

In this document, the following special terms are used:

- "Morse Dot": a short Morse signal; represented with "." in this
  document.

- "Morse Dash": a long Morse signal; represented with "-" in this
  document.

- "Morse Symbol": a sequence of one or more Dots, constituting a
  Morse character such as a letter or a punctuation mark.

- "Morse Pause": a short pause which separates adjacent
  Morse symbols; represented with " " (a space) in this document.

- "Morse Space": a long pause which separates words; represented
  with "/" in this document.

- "Morse Oct": a special Morse Symbol representing three bits of
  an Unicode code point.


----------------------------------------------------------------------
1. Encoding characters in the "ASCII printable" range.

Each Unicode characters in range U+0020..U+007E is encoded as a Morse
Space, as a single Morse Symbols, or as a sequence of two Morse
Symbols, as specified in the following table:

Code:  XTF-Morse:  Character name:
------ ----------- --------------------------
U+0020 /           SPACE (Morse Space)
U+0021 -----.      EXCLAMATION MARK [1]
U+0022 .-..-.      QUOTATION MARK
U+0023 .-.-..      NUMBER SIGN [1]
U+0024 ..-...      DOLLAR SIGN [1]
U+0025 ..-..-      PERCENT SIGN [1]
U+0026 ..-.-.      AMPERSAND [1]
U+0027 .----.      APOSTROPHE
U+0028 -.--.-      LEFT PARENTHESIS
U+0029 -.---.      RIGHT PARENTHESIS [1]
U+002A -.----      ASTERISK [1]
U+002B --....      PLUS SIGN [1]
U+002C --..--      COMMA
U+002D -....-      HYPHEN-MINUS
U+002E .-.-.-      FULL STOP
U+002F -..-.       SOLIDUS [1]
U+0030 -----       DIGIT ZERO
U+0031 .----       DIGIT ONE
U+0032 ..---       DIGIT TWO
U+0033 ...--       DIGIT THREE
U+0034 ....-       DIGIT FOUR
U+0035 .....       DIGIT FIVE
U+0036 -....       DIGIT SIX
U+0037 --...       DIGIT SEVEN
U+0038 ---..       DIGIT EIGHT
U+0039 ----.       DIGIT NINE
U+003A ---...      COLON
U+003B ---..-      SEMICOLON [1]
U+003C ---.-.      LESS-THAN SIGN [1]
U+003D ----..      EQUALS SIGN [1]
U+003E ---.--      GREATER-THAN SIGN [1]
U+003F ..--..      QUESTION MARK
U+0040 -.-.-.      COMMERCIAL AT [1]
U+0041 ..-- .-     LATIN CAPITAL LETTER A [2]
U+0042 ..-- -...   LATIN CAPITAL LETTER B [2]
U+0043 ..-- -.-.   LATIN CAPITAL LETTER C [2]
U+0044 ..-- -..    LATIN CAPITAL LETTER D [2]
U+0045 ..-- .      LATIN CAPITAL LETTER E [2]
U+0046 ..-- ..-.   LATIN CAPITAL LETTER F [2]
U+0047 ..-- --.    LATIN CAPITAL LETTER G [2]
U+0048 ..-- ....   LATIN CAPITAL LETTER H [2]
U+0049 ..-- ..     LATIN CAPITAL LETTER I [2]
U+004A ..-- .---   LATIN CAPITAL LETTER J [2]
U+004B ..-- -.-    LATIN CAPITAL LETTER K [2]
U+004C ..-- .-..   LATIN CAPITAL LETTER L [2]
U+004D ..-- --     LATIN CAPITAL LETTER M [2]
U+004E ..-- -.     LATIN CAPITAL LETTER N [2]
U+004F ..-- ---    LATIN CAPITAL LETTER O [2]
U+0050 ..-- .--.   LATIN CAPITAL LETTER P [2]
U+0051 ..-- --.-   LATIN CAPITAL LETTER Q [2]
U+0052 ..-- .-.    LATIN CAPITAL LETTER R [2]
U+0053 ..-- ...    LATIN CAPITAL LETTER S [2]
U+0054 ..-- -      LATIN CAPITAL LETTER T [2]
U+0055 ..-- ..-    LATIN CAPITAL LETTER U [2]
U+0056 ..-- ...-   LATIN CAPITAL LETTER V [2]
U+0057 ..-- .--    LATIN CAPITAL LETTER W [2]
U+0058 ..-- -..-   LATIN CAPITAL LETTER X [2]
U+0059 ..-- -.--   LATIN CAPITAL LETTER Y [2]
U+005A ..-- --..   LATIN CAPITAL LETTER Z [2]
U+005B ..---.      LEFT SQUARE BRACKET [1]
U+005C .-....      REVERSE SOLIDUS [1]
U+005D ..----      RIGHT SQUARE BRACKET [1]
U+005E .-...-      CIRCUMFLEX ACCENT [1]
U+005F ------      LOW LINE [1]
U+0060 ...---      GRAVE ACCENT [1]
U+0061 .-          LATIN SMALL LETTER A
U+0062 -...        LATIN SMALL LETTER B
U+0063 -.-.        LATIN SMALL LETTER C
U+0064 -..         LATIN SMALL LETTER D
U+0065 .           LATIN SMALL LETTER E
U+0066 ..-.        LATIN SMALL LETTER F
U+0067 --.         LATIN SMALL LETTER G
U+0068 ....        LATIN SMALL LETTER H
U+0069 ..          LATIN SMALL LETTER I
U+006A .---        LATIN SMALL LETTER J
U+006B -.-         LATIN SMALL LETTER K
U+006C .-..        LATIN SMALL LETTER L
U+006D --          LATIN SMALL LETTER M
U+006E -.          LATIN SMALL LETTER N
U+006F ---         LATIN SMALL LETTER O
U+0070 .--.        LATIN SMALL LETTER P
U+0071 --.-        LATIN SMALL LETTER Q
U+0072 .-.         LATIN SMALL LETTER R
U+0073 ...         LATIN SMALL LETTER S
U+0074 -           LATIN SMALL LETTER T
U+0075 ..-         LATIN SMALL LETTER U
U+0076 ...-        LATIN SMALL LETTER V
U+0077 .--         LATIN SMALL LETTER W
U+0078 -..-        LATIN SMALL LETTER X
U+0079 -.--        LATIN SMALL LETTER Y
U+007A --..        LATIN SMALL LETTER Z
U+007B --.-..      LEFT CURLY BRACKET [1]
U+007C --.--.      VERTICAL LINE [1]
U+007D --.-.-      RIGHT CURLY BRACKET [1]
U+007E --.---      TILDE [1]


----------------------------------------------------------------------
2. Encoding other Unicode characters
 
All other Unicode characters are encoded with sequences of 1 to 7
Morse Symbols called "Morse Octs".

Each Morse Oct represents three bits in the Unicode code value; in
other terms, Morse Octs are Morse-encoded octal digits.

There are two sets of Morse Octs: Morse Octs T0..T7 represent the last
octal digit in a sequence, whereas Morse Octs L0..L7 represent the
other octal digits.

Octal Digit: Morse Oct:
------------ ----------
L0            .-.--.
L1            .-.---
L2            .--...
L3            .--..-
L4            .--.-.
L5            .--.--
L6            .---..
L7            .---.-

Octal Digit: Morse Oct:
------------ ----------
T0            -...-.
T1            -...--
T2            -..-..
T3            -..-.-
T4            -..--.
T5            -..---
T6            -.-...
T7            -.-..-

The encoding of an Unicode code point proceeds with these steps:

- The Unicode code point is converted to an octal number.

- Leading zeros are stripped, if present.

- For each resulting octal digit apart the last one, the corresponding
  Morse Oct L0..L7 is emitted.
   
- The Morse Oct T0..T7 corresponding to the last octal digit is emitted.

The following table summarizes the number and kind of Octs generated
for each Unicode code point:

Code range:       Octal:  Generated Morse Octs:
----------------- ------- ---------------------
U+0000..U+0007    000000z Tz
U+0008..U+001F    00000yz Ly Tz
U+007F..U+01FF    0000xyz Lx Ly Tz
U+0200..U+0FFF    000wxyz Lw Lx Ly Tz
U+1000..U+7FFF    00vwxyz Lv Lw Lx Ly Tz
U+8000..U+3FFFF   0uvwxyz Lu Lv Lw Lx Ly Tz
U+40000..U+10FFFF tuvwxyz Lt Lu Lv Lw Lx Ly Tz


----------------------------------------------------------------------
3. Notes

[1]: Some Morse Symbol are unique to XTF-Morse, and are unknown in
     traditional Morse.

[2]: Capital letters use the same Morse Symbol as small letter,
     preceded by Morse code "..--" (which is unique to XTF-Morse).

[*]: In a previous version of this document, XTF-Morse was called
     "UTF-Morse". The name has been changed in order to emphasize
     that this is not a real UTF (Unicode Transformation Format),
     but just a parody of an UTF. Yes, a parody, a joke! Sorry, did
     you really read all of it seriously up to this point? :-)

======================================================================

Reply via email to