Warning: this email is in UTF-8. "U+nnnn" below (where nnnn is in hex) is the Unicode notation for a character (Unicode codepoint) given by value.

Gene Kwiecinski wrote:
Rare enough .. but besides "oeuf" is also occurs in such very common
words as "voeu" [wish] and "coeur" [heart] and it really bothers me
when
I see them incorrectly spelled in web pages for instance.  I spot it
and
after that I tend to lose focus and not be able to take in what I'm
reading for a short while.

How're they misspelled?

the above should all have œ (oe digraph, as one character) not o followed by e (two characters)



Or, if none of the distributed keymaps is exactly what you want, you
can
write your own. It isn't hard. See ":help :loadkeymap" for the
theory,
and look at the contents of Bram's $VIMRUNTIME/keymap/accents.vim
and my
$VIMRUNTIME/keymap/esperanto_utf8.vim for a couple of simple
examples.
Already started on this:  copied accents.vim to ~/.vim/keymap/ ..
renamed it to foreign.vim and added the Spanish inverted question /
exclamation marks - an for now I have mapped to "!!" and "??".

Come to think of it, French would appear to have the most annoying
spelling system of the West European languages that I have some degree
of familiarity with.  Spanish, Italian, and German seem to use fewer
non-ASCII characters.

Spanish and Italian both can accent any vowel (grave accent in Italian, acute accent in Italian), and Spanish also has inverted ? and !

Catalan has some grave-accented vowels, some acute-accented ones, some which can take either, the c-cedilla, and also a lettergroup which I've seen in no other language: ll with a dot at mid-height between them, to mean "geminated l" (pron. as French ll) as opposed to "palatalized l" (pron. more or less as Spanish ll). ŀl, U+0140 U+006C, decimal 320 108; K l . followed by plain l. Examples: geminated in coŀlinear, palatalized in coll (mountain pass). The Spanish inverted ? and ! are optional in Catalan but can be used in the middle of a sentence to mark the start of the question or exclamation. The uppercase equivalent (ĿL, used only in full-capitals titles since the geminated ells must be part of different syllables, has Ŀ = U+013F, dec. 319, Ctrl-K L .


In order to set up my foreign language keymap correctly I would really
need tables of all the characters that occur in these languages, decide
which ones are common enough to be worth adding to the keymap, and make
sure I build a scheme that's coherent before I get my fingers to
memorize it.  I'll scour the Wiki's later today.. see if I can find
anything useful.

If you wouldn't mind, definitely keep me in the loop on this one, as
I've got something of an interest.

Offhand, some contributions and questions:

beta-looking SS (German)

ß eszet, U+00DF, decimal 223. Don't know if the "accents" keymap has it. With digraphs: Ctrl-K s s

slashed 'l' (Polish)

ł -- don't know the name; I'd guess "hard l" (pronounced more or less as w in Polish but etymologically related to what the Russian pronounce like the ll in English bell). Don't think "accents" has it as it is >255. U+0142, decimal 322, Ctrl-K l / -- Uppercase: Ł U+0141, dec. 321, Ctrl-K L /

slashed 'o' (Scandinavian or thereabouts, not sure if Dutch or other)

Danish and Norwegian, not Swedish, equivalent of German/Swedish "ö". ø, U+00F8, decimal 248, Ctrl-K o / -- Uppercase: Ø, U+00D8, decimal 216, Ctrl-K O / -- I guess /O and /o with the "accents" keymap. In other languages, a similar or identical glyph is used to mean "diameter".

AElig/aelig/OElig/oelig (Latin, etc.)

Among modern languages:
  Danish
    Æ  AE  U+00C6  dec.198   Ctrl-K A E
    æ  ae  U+00E6  dec.230   Ctrl-K a e
  French
    Œ  OE  U+0152  dec.338   Ctrl-K O E
    œ  oe  U+0153  dec.339   Ctrl-K o e

ccedil/Ccedil (how done, ",C"?)

in the "accents" keymap, Ithink so. With digraphs, Ctrl-K c , or Ctrl-K C , -- U+00C7, decimal 199 (Ç) and U+00E7, decimal 231 (ç).

ecedil(?) (also Polish, possibly other vowels, 'though don't recall
offhand)

ę, it's not a cedilla, (it's turned the other way), it's called an ogonek (thus: e-ogonek). If your keymap hasn't got it, it's Ctrl-K e ; -- U+0119, decimal 281.


Oh, someone on the list is native Polish, so might ask him.  Was it
Mikolaj?

Dunno anyone Dutch who'd recall the slashed-'o'.

Its not Dutch, its Danish and Norwegian. Dutch only has ij IJ (usually typed as two letters, it's up to the browser to fetch the "presentation form" if deemed appropriate, like with French fi ffi fl ffl etc. IJ is used e.g. in IJsland (Iceland), IJszee (the Glacial Ocean, usually the Artic, but there is also a "Zuidelijke IJszee" around the Antartic continent), IJ (a river in the Netherlands, with the town of IJmuiden at its mouth IIRC), IJssel (two other rivers: Hollandse IJssel and Gelderse IJssel; the latter ends up in the IJsselmeer); ijs (ice) becomes IJs at the start of a sentence, etc. Dutch also has some accented vowels in foreign words only, such as é in café (tavern, coffeehouse); or optionally used to show word stress, as in hét, dé (with é pronounced like stressed schwa) when one wants to stress the article.


How to enter Aring (eg, Ångstrom)?  "oA"??  Synonymous with "aa"
(eg, "Haas" == "Hås"?)

With digraphs: Ctrl-K A A (uppercase, Å, U+00C5, dec. 197), Ctrl-K a a (lowercase, å, U+00E5, dec.229).



Oh, well...




To get all digraphs into a file: first set 'encoding' to UTF-8 and then:

        :redir ~/digraphs.txt
        " or :redir! ~/digraphs.txt to force overwrite
        :set nomore | dig | set more
        :redir END

To find any Unicode codepoint: http://www.unicode.org/charts/

If you know the numeric value of a character (0 to 0x7FFFFFFF in Vim; less than that in the current version of the Unicode standard), see ":help i_CTRL-V_digit" about how to input it, and don't forget that where mswin.vim maps Ctrl-V to the paste operation, the "normal" "Vim" functions of Ctrl-V are taken over by Ctrl-Q

If you have any character under the cursor in a text buffer in Vim, ga will show it to you in alpha, octal, hex and decimal.

You can also copy and paste any character which has a representation in your current 'encoding'.

Depending on the language you're typing in, one keymap or another may be appropriate; each keymap has its own mappings for "foreign" characters.


Best regards,
Tony.

Reply via email to