Bram Moolenaar wrote:
Cyril Slobin wrote:

On 4/20/07, Bram Moolenaar <[EMAIL PROTECTED]> wrote:

Please take the existing $VIMRUNTIME/spell/eo/main.aap and modify it a
bit to build the .spl file.  This can't be very difficult, you would
mostly use the command you type manually.
OK, I'll try this. Probably tomorrow.

Good.

What is strange is that myspell uses eo_l3 and you have eo_EO and eo_UX.
Why two regions?
Esperanto language uses some letters from Latin3 character set. Of
course, they are in Unicode too. But during half-century in
ASCII-based world there was established some conventions for
transcribing these letters in pure ASCII. There are still some
disagreements which one is most popular, or most standard, or most
suitable, but I believe that "Cxirkaux"-convention is most widely used
(no, I can't prove this with statistics). The convention is named
"Cxirkaux" after transcription of the word "Ĉirkaŭ" (I hope you have
an appropriate font installed to read this). It is handy to be able to
check Esperanto text in both modes (or choose any one of two).
Probably to make two files -- eo.ascii.spl and eo.utf-8.spl -- will be
theoretically more pure, but my solution allows to switch between two
modes fast.

OK.  So when the user does ":set spl=eo_eo" he still gets the "pure"
version?  It's important that the user has a choice of what words he
wants to accept.


I think you got it. For any people who'd want a better understanding of the matter, here's what I could think of as an explanation:

The non-ASCII letters in the Esperanto variant of the Latin alphabet are the following (in upper and lower case):

Ĉĉ, C-circumflex
Ĝĝ, G-circumflex
Ĥĥ, H-circumflex
Ĵĵ, J-circumflex
Ŝŝ, S-circumflex
Ŭŭ, U-breve

There are at several known ways to transliterate them to ASCII, and Esperantists have been arguing without end about which one was "the better", with partisans of the first and last ones below flaming each other for weeks on end in the Usenet group soc.culture.esperanto and elsewhere:

* "h-method". In the original work which brought Esperanto to the public in 1887, the work "The International Language" in five languages (Russian, Polish, French, German, and English) by "D-ro Esperanto", a nom-de-plume of Dr. Louis Lazarus Zamenhof, there was an additional sentence under the "Alphabet" (I'm quoting from the English edition):

        *Remark.* -- If it be found impracticable to print works with the
        diacritical signs (^,˘), the letter h may be substituted for the sign
        (^), and the sign (˘) may be omitted altogether.

IOW: ch, gh, hh, jh, sh, u.

One problem of this "official" or "Fundamental" notation is that it creates ambiguities between u and u-breve, and between letter+circumflex and letter+h. The latter can be found in e.g. chashundo (chas- +hundo, "a hunting dog"), danchalo (danc- + halo, "a ballroom"), etc. It is, however, most "natural-looking" in that the most-used of these, c-circumflex and s-circumflex, have exactly the sound of English ch and sh; and u-breve, which represents the semivowel [w], is used almost exclusively after a vowel, to form the second part of the closing diphongs [au] [eu] and, rarely, [ou]. (U-breve also occurs initially in a few imported words like uato "(hydrophilic) cotton; cotton wool", from French "ouate".)

Examples: chasi "to hunt", ghardeno "a garden", ehho "an echo", bovajho "beef", shajni "to seem", preskau "almost".

* "Slavic" method (for use on Latin typewriters for East-European countries): replace the circumflex or breve by a caron (a superscript similar in shape to the letter v). This method, though "unofficial" like all the ones below, is also somewhat "natural-looking", at least for Slavic-language people.

* "pre-circumflex method": ^c, ^g, ^h, ^j, ^s, ^u

* "post-circumflex method": c^, g^, h^, j^, s^, u^

* "x-method": cx, gx, hx, jx, sx, ux. This one has been widely used on computer systems (see at bottom) but it requires _three_ (not two) case variants for each letter: uppercase (for titles in all-caps), CX GX HX JX SX UX; titlecase (for the first letter of a sentence or of a proper noun etc.), Cx Gx Hx Jx Sx Ux; lowercase, cx gx hx jx sx ux.

All the above except the first avoid ambiguities, because Esperanto doesn't use the letter X (or a freestanding circumflex); but they (especially the latter three) have been variously described as "ugly" and as "contrary to the «Fundamento de Esperanto»".

Add to these, any of the above except the last, with u-breve replaced by u-grave (which, like the dead-key circumflex, can be found on any typewriter for the French language).

IIUC, the eo_EO region would use the "actual" letters with superscripts (as found in Latin3 or Unicode), and the eo_UX region would use method 4, which is widely used on computer systems, including in e-mails emanating from the Universal Esperanto Association (i.e., the Esperantist headquarters) in Rotterdam.


Best regards,
Tony.
--
Our country has plenty of good five-cent cigars, but the trouble is
they charge fifteen cents for them.

Reply via email to