Re: Other European languages on a US keyboard

A.J.Mechelynck Mon, 24 Jul 2006 20:37:06 -0700

cga2000 wrote:

On Mon, Jul 24, 2006 at 08:29:10PM EDT, A.J.Mechelynck wrote:

cga2000 wrote:

On Mon, Jul 24, 2006 at 05:59:42PM EDT, Christian Ebert wrote:

* A.J.Mechelynck on Saturday, July 22, 2006 at 22:40:45 +0200:
The French oe (o, e-dans-l'o) is not defined in the Latin1 encoding,neither in capitals (as for titles or if the word "oeuf" [egg] is thefirst of a sentence), nor in lowercase. You need UTF-8 for it,
No. Just latin9 or ISO8859-15 (Look at the header of this mail).
Mon coeur.

This is on a Mac with a German keyboard, but using actually an
American keyboard layout. I enter the "oe" with Alt-q (the "Alt"
key on Mac keyboard corresponds to the Modifier key on other
keyboards I believe).

Could this be Mac-specific?

I switched to encoding=latin9.

When I do a Ctrl-K o e and a Ctrl-K O E this is what I get:

½ ¼

confirmed by the :dig command.

I looked carefully at the output of :dig and I couldn't see our elusive
"e dans l'o" either.

So I switched to the French ISO-08859-15, then the US version of
latin9.. still can't find that "o dans l'e".

Strange thing is that the font I use on terminals does have these two
characters (upper/lower case E dans l'O..) in the exact same spot Vim

displays the above fractions..

Try the following (in gvim):

.. with all the goings-on in this thread I never had a chance to
mention the fact that I do not use gvim. I try to do everything in a
terminal (under gnu/screen) because text-mode apps were designed for
the keyboard so they work a lot better than gui's for those of us who
prefer not to use mice.

Gvim can use keyboard commands just like console Vim, or mice-addictedpeople can use that too. It has a lot more different coulours (typically16 million rather than 16) and it can change fonts on-the-fly (changethe font from Courier to Lucida to whatever, only through Vim keyboardcommands). It can do "real" boldface and italics, as well as straight orcurly underlining. And it can use Unicode: see further down.

 :echo has("multi_byte")

the answer should be 1. If it is zero, your version of gvim cannot
handle UTF-8.

Works fine if I switch my locale to UTF-8.  Vim automatically figures
what I want and :dig displays the "o dans l'e" (both the lower and upper
case versions) among a gazillon other digraphs. Then I can use the
ususal Ctrl-K oe .. save the file.. pass this on to LaTeX and provided I
have the correct LaTeX statements to activate UTF-8 (that's what took
forever to figure out the other day..) I get my "coeurs", "voeux" and
"boeufs" rendered correctly in xdvi/gv .. *and* the the ensuing
printout looks great too.

The problem with this is that I haven't found a comfortable way to
run Vim in UTF-8 mode and the rest of my stuff in 8-bit mode.

Well, in an xterm (or konsole, or Windows Dos Box), console Vim isdependent on xhatever charset the console is using. If you xterm (orwhatever) is in Latin1, you cannot use French oe anymore than you canuse Cyrillic or Greek. Gvim, on the other hand, can display anything forwhich you have a glyph in a font.


Over the week-end I found that I can run Vim in a separate "unicode"
xterm but that's not what I want because I lose screen's copy/paste and
more importantly it destroys my attempt at running a fully integrated
"desktop".

Other problems that I have run into is that text files created when in
UTF-8 mode are a mess when browsed in latin1/9 mode.  I also have
problems when I print "unicode" files.. I once created a nice table
with those box-drawing characters that were available in UTF-8 mode and
it was really nice on-screen.. but when I tried to print it, all I got
was rows and columns of questiion marks.

So I switched back to latin1 pending better internationalization support
in some applications (slrn, ELinks.. mutt should workd but it's tricky)
and maybe more importantly until I acquire a better understanding of
running a unicode locale in X/linux and the implications thereof..

 :if &tenc=="" | let &tenc = &enc | endif | set enc=utf-8 :new

then i (set Insert mode) and ^Vu0153 (where ^V is Ctrl-V, unless you
use Ctrl-V to paste, in which case it is Ctrl-Q).

If you see anything other than the oe digraph, then your 'guifont' is
plain wrong. See http://vim.sourceforge.net/tips/tip.php?tip_id=632
about how to choose a better one.

Well.. actually.. I ran some tests in latin-9 earlier.. trying to figure
out this "o dans l'e" business.. that was on a linux console..  and
that's where I realized that I was still running a unicode font.. both
on the linux console and in 'X'.. :-) .. It seems I never switched back
after my brief incursion into unicode territory..  and since I haven't
had any problems displaying and printing text since I switched back.. I
would say that the font is ok..  And that UTF-8 stuff is indeed
backward-compatible?

The font is called "terminus" and I like it a lot because it looks like
a fixed-width version of MS's Verdana, which is my favorite screen font.

see http://wwww.geocities.com/cga9999/wee.png for an excellent
screenshot.

Thanks

cga



So what is Unicode and what is UTF-8.

Unicode is a system to allow using *together* all writing systems knownto man. That's a lot. A "character space" with over 1 billion slots hasbeen set apart for all those characters.

Unicode is also a number of 'encodings' -- manners to represent thatdata in memory or on a storage medium. The simplest of these encodingsis UTF-32 (aka UCS-4): use 32 bits for each characters, in theendianness of your machine. The most economical is usually UTF-8, whichuses between one and four bytes per character; also, it represents 7-bitASCII identically as ASCII; characters 128-255 of the Latin1 encodinghave the same ordinal position in the scheme but are represented by twobytes each. Also, of the 3 principal Unicode encodings, UTF-8 is theonly one which isn't subdivided into "big-endian" and "little-endian"varieties. There is no risk of out-of-phase errors, because of theallotment of the bytes: 0-0x7F are single-byte characters, 0x80-0xBF are"trailing bytes" (any byte except the first in a multi-byte character),0xC0-0xFF are "header bytes" (the first byte in a multibyte character)and in addition, the header byte specifies how long the sequence is.

Vim can translate back and forth between Unicode and any other charsetquite easily, so (in gvim, or in Vim running in a Unicode terminal) youmay set 'encoding' to utf-8 and use ":setlocal fileencoding=latin1" forWestern Europe, ":setlocal fileencoding=sjis" for Japanese, etc., on afile-by-file basis. All those files can coexist in a single instance ofgvim.

By using ":setlocal bomb" on a Unicode file, you can place at itsbeginning the codepoint U+FEFF ZERO-WIDTH NO-BREAK SPACE which is thenused by other programs (or by Vim itself) to identify the fact that thisfile is in Unicode, and in which particular Unicode encoding and endianness.


For more details, see
  :help mbyte.txt
  http://vim.sourceforge.net/tips/tip.php?tip_id=246

section 37 (last) of the Vim FAQhttp://vimdoc.sourceforge.net/htmldoc/vimfaq.html

  http://www.unicode.org/
  http://www.cl.cam.ac.uk/~mgk25/unicode.html


Best regards,
Tony.

Re: Other European languages on a US keyboard

Reply via email to