Hi,

with 'encoding' set to "utf-8" there is a quite confusing (to me)
difference between the column number and my expectations (supported by
the virtual column number) if there are non-ASCII characters on the
line. I don't know what the intended meaning of "column count" and the
intended behaviour of "cursor()" are, but it seems they both depend on
the size of the encoded characters. I always thought "nth column" was
more or less a synonym for "nth character on a line" while "nth virtual
column" meant "nth cell on a screen line".

Here is how to reproduce the observed behaviour. Start

   vim -u NONE -U NONE

and

  :set encoding=utf-8
  :set laststatus=2
  :set statusline=[%c/%v]

(The last line tells VIM to display the column and the virtual column.)
Now enter two lines

  abc
  äbc

(The first letter in the second line is a lower case "A" with umlaut.)
While moving the cursor over the different characters on the first line
the status line shows "[1/1]", "[2/2]", and "[3/3]", respectively,
telling you that "column" and "virtual column" are equal. That is the
expected behaviour as long as there are no special characters like tabs
and non-printable characters.

Now move the cursor over the characters in the second line. While the
cursor is over the "ä" "[1/1]" is displayed, but the next characters
result in "[3/2]" and "[4/3]", respectively. It seems as if "ä" (or any
non-ASCII character, for that matter) is accounting for (at least) two
columns while encoding is set to "utf-8". Although I know that "ä" is
represented by two bytes in UTF-8 encoding, I find this behaviour
irritating because on the surface it's only one character. It even gets
worse (IMHO) with characters that need three bytes in UTF-8 encoding,
like LATIN CAPITAL LETTER A WITH DOT BELOW (0x1EA0), which increase the
column number by three.

Also the "cursor()" function shows this kind of interpretation of
non-ASCII characters. Both

  call cursor(2, 1)

and

  call cursor(2, 2)

place the cursor on "ä". To place it on "b" you need to

  call cursor(2, 3)

although I would expect that already the second example would place the
cursor on "b".

I can think of two ways to circumvent this problem:

  1) switching to "encoding=latin1", which is not always an option
     because of the need for characters outside the scope of latin1;

  2) using only virtual column numbers in the status line, but this
     gives different results when characters like tab or non-printables
     are displayed in more than one screen cell (which is of course
     reasonable).

I don't know whether the shown behaviour is a bug or just a feature I
don't like, but in summary I think "column number" should really
represent a character count (i.e, corresponding to what the user sees),
not a byte count depending on the underlying encoding.

I have seen this behaviour in VIM 6.2, 6.3, 6.4, and 7.0, so changing
the code will definitely introduce an incompatibility. So the final
question is: What do you (Vimmers) and you (Bram) think: is there a need
for a change.

Regards,
Jürgen

-- 
Jürgen Krämer                              Softwareentwicklung
HABEL GmbH & Co. KG                        mailto:[EMAIL PROTECTED]
Hinteres Öschle 2                          Tel: +49 / 74 61 / 93 53 - 15
78604 Rietheim-Weilheim                    Fax: +49 / 74 61 / 93 53 - 99

Reply via email to