cga2000 wrote:
On Mon, Jul 24, 2006 at 08:29:10PM EDT, A.J.Mechelynck wrote:
cga2000 wrote:
On Mon, Jul 24, 2006 at 05:59:42PM EDT, Christian Ebert wrote:
* A.J.Mechelynck on Saturday, July 22, 2006 at 22:40:45 +0200:
The French oe (o, e-dans-l'o) is not defined in the Latin1 encoding,
neither in capitals (as for titles or if the word "oeuf" [egg] is the
first of a sentence), nor in lowercase. You need UTF-8 for it,
No. Just latin9 or ISO8859-15 (Look at the header of this mail).
Mon coeur.
This is on a Mac with a German keyboard, but using actually an
American keyboard layout. I enter the "oe" with Alt-q (the "Alt"
key on Mac keyboard corresponds to the Modifier key on other
keyboards I believe).
Could this be Mac-specific?
I switched to encoding=latin9.
When I do a Ctrl-K o e and a Ctrl-K O E this is what I get:
½ ¼
confirmed by the :dig command.
I looked carefully at the output of :dig and I couldn't see our elusive
"e dans l'o" either.
So I switched to the French ISO-08859-15, then the US version of
latin9.. still can't find that "o dans l'e".
Strange thing is that the font I use on terminals does have these two
characters (upper/lower case E dans l'O..) in the exact same spot Vim
displays the above fractions..
Try the following (in gvim):
.. with all the goings-on in this thread I never had a chance to
mention the fact that I do not use gvim. I try to do everything in a
terminal (under gnu/screen) because text-mode apps were designed for
the keyboard so they work a lot better than gui's for those of us who
prefer not to use mice.
Gvim can use keyboard commands just like console Vim, or mice-addicted
people can use that too. It has a lot more different coulours (typically
16 million rather than 16) and it can change fonts on-the-fly (change
the font from Courier to Lucida to whatever, only through Vim keyboard
commands). It can do "real" boldface and italics, as well as straight or
curly underlining. And it can use Unicode: see further down.
:echo has("multi_byte")
the answer should be 1. If it is zero, your version of gvim cannot
handle UTF-8.
Works fine if I switch my locale to UTF-8. Vim automatically figures
what I want and :dig displays the "o dans l'e" (both the lower and upper
case versions) among a gazillon other digraphs. Then I can use the
ususal Ctrl-K oe .. save the file.. pass this on to LaTeX and provided I
have the correct LaTeX statements to activate UTF-8 (that's what took
forever to figure out the other day..) I get my "coeurs", "voeux" and
"boeufs" rendered correctly in xdvi/gv .. *and* the the ensuing
printout looks great too.
The problem with this is that I haven't found a comfortable way to
run Vim in UTF-8 mode and the rest of my stuff in 8-bit mode.
Well, in an xterm (or konsole, or Windows Dos Box), console Vim is
dependent on xhatever charset the console is using. If you xterm (or
whatever) is in Latin1, you cannot use French oe anymore than you can
use Cyrillic or Greek. Gvim, on the other hand, can display anything for
which you have a glyph in a font.
Over the week-end I found that I can run Vim in a separate "unicode"
xterm but that's not what I want because I lose screen's copy/paste and
more importantly it destroys my attempt at running a fully integrated
"desktop".
Other problems that I have run into is that text files created when in
UTF-8 mode are a mess when browsed in latin1/9 mode. I also have
problems when I print "unicode" files.. I once created a nice table
with those box-drawing characters that were available in UTF-8 mode and
it was really nice on-screen.. but when I tried to print it, all I got
was rows and columns of questiion marks.
So I switched back to latin1 pending better internationalization support
in some applications (slrn, ELinks.. mutt should workd but it's tricky)
and maybe more importantly until I acquire a better understanding of
running a unicode locale in X/linux and the implications thereof..
:if &tenc=="" | let &tenc = &enc | endif | set enc=utf-8 :new
then i (set Insert mode) and ^Vu0153 (where ^V is Ctrl-V, unless you
use Ctrl-V to paste, in which case it is Ctrl-Q).
If you see anything other than the oe digraph, then your 'guifont' is
plain wrong. See http://vim.sourceforge.net/tips/tip.php?tip_id=632
about how to choose a better one.
Well.. actually.. I ran some tests in latin-9 earlier.. trying to figure
out this "o dans l'e" business.. that was on a linux console.. and
that's where I realized that I was still running a unicode font.. both
on the linux console and in 'X'.. :-) .. It seems I never switched back
after my brief incursion into unicode territory.. and since I haven't
had any problems displaying and printing text since I switched back.. I
would say that the font is ok.. And that UTF-8 stuff is indeed
backward-compatible?
The font is called "terminus" and I like it a lot because it looks like
a fixed-width version of MS's Verdana, which is my favorite screen font.
see http://wwww.geocities.com/cga9999/wee.png for an excellent
screenshot.
Thanks
cga
So what is Unicode and what is UTF-8.
Unicode is a system to allow using *together* all writing systems known
to man. That's a lot. A "character space" with over 1 billion slots has
been set apart for all those characters.
Unicode is also a number of 'encodings' -- manners to represent that
data in memory or on a storage medium. The simplest of these encodings
is UTF-32 (aka UCS-4): use 32 bits for each characters, in the
endianness of your machine. The most economical is usually UTF-8, which
uses between one and four bytes per character; also, it represents 7-bit
ASCII identically as ASCII; characters 128-255 of the Latin1 encoding
have the same ordinal position in the scheme but are represented by two
bytes each. Also, of the 3 principal Unicode encodings, UTF-8 is the
only one which isn't subdivided into "big-endian" and "little-endian"
varieties. There is no risk of out-of-phase errors, because of the
allotment of the bytes: 0-0x7F are single-byte characters, 0x80-0xBF are
"trailing bytes" (any byte except the first in a multi-byte character),
0xC0-0xFF are "header bytes" (the first byte in a multibyte character)
and in addition, the header byte specifies how long the sequence is.
Vim can translate back and forth between Unicode and any other charset
quite easily, so (in gvim, or in Vim running in a Unicode terminal) you
may set 'encoding' to utf-8 and use ":setlocal fileencoding=latin1" for
Western Europe, ":setlocal fileencoding=sjis" for Japanese, etc., on a
file-by-file basis. All those files can coexist in a single instance of
gvim.
By using ":setlocal bomb" on a Unicode file, you can place at its
beginning the codepoint U+FEFF ZERO-WIDTH NO-BREAK SPACE which is then
used by other programs (or by Vim itself) to identify the fact that this
file is in Unicode, and in which particular Unicode encoding and endianness.
For more details, see
:help mbyte.txt
http://vim.sourceforge.net/tips/tip.php?tip_id=246
section 37 (last) of the Vim FAQ
http://vimdoc.sourceforge.net/htmldoc/vimfaq.html
http://www.unicode.org/
http://www.cl.cam.ac.uk/~mgk25/unicode.html
Best regards,
Tony.