cga2000 wrote:
On Mon, Jul 24, 2006 at 08:29:10PM EDT, A.J.Mechelynck wrote:
cga2000 wrote:
On Mon, Jul 24, 2006 at 05:59:42PM EDT, Christian Ebert wrote:
* A.J.Mechelynck on Saturday, July 22, 2006 at 22:40:45 +0200:
The French oe (o, e-dans-l'o) is not defined in the Latin1 encoding, neither in capitals (as for titles or if the word "oeuf" [egg] is the first of a sentence), nor in lowercase. You need UTF-8 for it,
No. Just latin9 or ISO8859-15 (Look at the header of this mail).

Mon coeur.

This is on a Mac with a German keyboard, but using actually an
American keyboard layout. I enter the "oe" with Alt-q (the "Alt"
key on Mac keyboard corresponds to the Modifier key on other
keyboards I believe).
Could this be Mac-specific?
I switched to encoding=latin9.

When I do a Ctrl-K o e and a Ctrl-K O E this is what I get:

½ ¼
confirmed by the :dig command.

I looked carefully at the output of :dig and I couldn't see our elusive
"e dans l'o" either.

So I switched to the French ISO-08859-15, then the US version of
latin9.. still can't find that "o dans l'e".

Strange thing is that the font I use on terminals does have these two
characters (upper/lower case E dans l'O..) in the exact same spot Vim
displays the above fractions..
Try the following (in gvim):

.. with all the goings-on in this thread I never had a chance to
mention the fact that I do not use gvim. I try to do everything in a
terminal (under gnu/screen) because text-mode apps were designed for
the keyboard so they work a lot better than gui's for those of us who
prefer not to use mice.

Gvim can use keyboard commands just like console Vim, or mice-addicted people can use that too. It has a lot more different coulours (typically 16 million rather than 16) and it can change fonts on-the-fly (change the font from Courier to Lucida to whatever, only through Vim keyboard commands). It can do "real" boldface and italics, as well as straight or curly underlining. And it can use Unicode: see further down.


 :echo has("multi_byte")

the answer should be 1. If it is zero, your version of gvim cannot
handle UTF-8.

Works fine if I switch my locale to UTF-8.  Vim automatically figures
what I want and :dig displays the "o dans l'e" (both the lower and upper
case versions) among a gazillon other digraphs. Then I can use the
ususal Ctrl-K oe .. save the file.. pass this on to LaTeX and provided I
have the correct LaTeX statements to activate UTF-8 (that's what took
forever to figure out the other day..) I get my "coeurs", "voeux" and
"boeufs" rendered correctly in xdvi/gv .. *and* the the ensuing
printout looks great too.

The problem with this is that I haven't found a comfortable way to
run Vim in UTF-8 mode and the rest of my stuff in 8-bit mode.

Well, in an xterm (or konsole, or Windows Dos Box), console Vim is dependent on xhatever charset the console is using. If you xterm (or whatever) is in Latin1, you cannot use French oe anymore than you can use Cyrillic or Greek. Gvim, on the other hand, can display anything for which you have a glyph in a font.


Over the week-end I found that I can run Vim in a separate "unicode"
xterm but that's not what I want because I lose screen's copy/paste and
more importantly it destroys my attempt at running a fully integrated
"desktop".

Other problems that I have run into is that text files created when in
UTF-8 mode are a mess when browsed in latin1/9 mode.  I also have
problems when I print "unicode" files.. I once created a nice table
with those box-drawing characters that were available in UTF-8 mode and
it was really nice on-screen.. but when I tried to print it, all I got
was rows and columns of questiion marks.

So I switched back to latin1 pending better internationalization support
in some applications (slrn, ELinks.. mutt should workd but it's tricky)
and maybe more importantly until I acquire a better understanding of
running a unicode locale in X/linux and the implications thereof..

 :if &tenc=="" | let &tenc = &enc | endif | set enc=utf-8 :new

then i (set Insert mode) and ^Vu0153 (where ^V is Ctrl-V, unless you
use Ctrl-V to paste, in which case it is Ctrl-Q).

If you see anything other than the oe digraph, then your 'guifont' is
plain wrong. See http://vim.sourceforge.net/tips/tip.php?tip_id=632
about how to choose a better one.

Well.. actually.. I ran some tests in latin-9 earlier.. trying to figure
out this "o dans l'e" business.. that was on a linux console..  and
that's where I realized that I was still running a unicode font.. both
on the linux console and in 'X'.. :-) .. It seems I never switched back
after my brief incursion into unicode territory..  and since I haven't
had any problems displaying and printing text since I switched back.. I
would say that the font is ok..  And that UTF-8 stuff is indeed
backward-compatible?

The font is called "terminus" and I like it a lot because it looks like
a fixed-width version of MS's Verdana, which is my favorite screen font.

see http://wwww.geocities.com/cga9999/wee.png for an excellent
screenshot.

Thanks

cga




So what is Unicode and what is UTF-8.

Unicode is a system to allow using *together* all writing systems known to man. That's a lot. A "character space" with over 1 billion slots has been set apart for all those characters.

Unicode is also a number of 'encodings' -- manners to represent that data in memory or on a storage medium. The simplest of these encodings is UTF-32 (aka UCS-4): use 32 bits for each characters, in the endianness of your machine. The most economical is usually UTF-8, which uses between one and four bytes per character; also, it represents 7-bit ASCII identically as ASCII; characters 128-255 of the Latin1 encoding have the same ordinal position in the scheme but are represented by two bytes each. Also, of the 3 principal Unicode encodings, UTF-8 is the only one which isn't subdivided into "big-endian" and "little-endian" varieties. There is no risk of out-of-phase errors, because of the allotment of the bytes: 0-0x7F are single-byte characters, 0x80-0xBF are "trailing bytes" (any byte except the first in a multi-byte character), 0xC0-0xFF are "header bytes" (the first byte in a multibyte character) and in addition, the header byte specifies how long the sequence is.

Vim can translate back and forth between Unicode and any other charset quite easily, so (in gvim, or in Vim running in a Unicode terminal) you may set 'encoding' to utf-8 and use ":setlocal fileencoding=latin1" for Western Europe, ":setlocal fileencoding=sjis" for Japanese, etc., on a file-by-file basis. All those files can coexist in a single instance of gvim.

By using ":setlocal bomb" on a Unicode file, you can place at its beginning the codepoint U+FEFF ZERO-WIDTH NO-BREAK SPACE which is then used by other programs (or by Vim itself) to identify the fact that this file is in Unicode, and in which particular Unicode encoding and endianness.

For more details, see
  :help mbyte.txt
  http://vim.sourceforge.net/tips/tip.php?tip_id=246
section 37 (last) of the Vim FAQ http://vimdoc.sourceforge.net/htmldoc/vimfaq.html
  http://www.unicode.org/
  http://www.cl.cam.ac.uk/~mgk25/unicode.html


Best regards,
Tony.

Reply via email to