Re: Irritating column numbers with encoding=utf-8

Jürgen Krämer Wed, 05 Jul 2006 23:12:04 -0700

Hi,

Bram Moolenaar wrote:
>
> Jürgen Krämer wrote:
>
>> with 'encoding' set to "utf-8" there is a quite confusing (to me)
>> difference between the column number and my expectations (supported by
>> the virtual column number) if there are non-ASCII characters on the
>> line. I don't know what the intended meaning of "column count" and the
>> intended behaviour of "cursor()" are, but it seems they both depend on
>> the size of the encoded characters. I always thought "nth column" was
>> more or less a synonym for "nth character on a line" while "nth virtual
>> column" meant "nth cell on a screen line".
>>
[snipped
>>
>> I don't know whether the shown behaviour is a bug or just a feature I
>> don't like, but in summary I think "column number" should really
>> represent a character count (i.e, corresponding to what the user sees),
>> not a byte count depending on the underlying encoding.
>>
>> I have seen this behaviour in VIM 6.2, 6.3, 6.4, and 7.0, so changing
>> the code will definitely introduce an incompatibility. So the final
>> question is: What do you (Vimmers) and you (Bram) think: is there a need
>> for a change.
>
> I don't know why you call this a column count, in most places it's
> called a byte count.  Perhaps in some places in the docs the remark
> about this actually being a byte count is missing.


sorry, the "column count" in the first paragraph should have been a
"column number". I called it so because I have the statusline option set
to

  %<%f%= [%1*%M%*%{','.&fileformat}%R%Y] [%6l,%4c%V] %3b=0x%02B %P

and noticed that "%4c-%V" displayed two numbers instead of the one I
expected, because I knew there were no tabs or unprintable characters
on that line. Even more disturbing was the fact that the first number
(the column number) was bigger than the second one (the virtual column
number). So I checked ":help statusline" and it told me

        c N   Column number.
        v N   Virtual column number.
        V N   Virtual column number as -{num}.  Not displayed if equal to 'c'.

> You could also want a character count.  But what is a character when
> using composing characters?  E.g., when the umlaut is not included in
> a character but added as a separate composing character?

I would say that a character is what the user sees. Why should he (be
forced to) know wheter "ä" is represented internally as LATIN SMALL
LETTER A WITH DIAERESIS or as LATIN SMALL LETTER A plus COMBINING
DIARESIS? So in my opinion "column count" is equivalent to "character
count" unless there are characters like tabs and unprintable ones that
have a special representation -- on the screen, not internally.

> It's not so obvious what to do.  In these situations I rather keep it as
> it is.

I know it's a big change and would introduce imcompatibiliy with older
versions, but here is another example: Take this line (ignoring the
leading spaces)

  ääbbcc

and the following commands

  :s/\%3c../xx/
  %s/^..\zs../xx/

>From my point of view they should both replace the 3rd and 4th column
with "xx". When encoding is set to latin1 they do, but not when it is
set to utf-8 -- the first one replaces "äb" with "xx". As a user I would
be really stumbled and ask "Why that, it's the same text as before."

Changing these commands to

  :s/\%2c../xx/
  %s/^.\zs../xx/

makes things even more irritating. The second one works as expected, now
correctly replacing "äb" with "xx", but the first one fails with "E486:
Pattern not found: \%2c..". Again: Ought I (as a user) really need to
know that \%2c depends on the number of non-ASCII letters in front of
the column I'm interested in?

Regards,
Jürgen

-- 
Jürgen Krämer                              Softwareentwicklung
HABEL GmbH & Co. KG                        mailto:[EMAIL PROTECTED]
Hinteres Öschle 2                          Tel: +49 / 74 61 / 93 53 - 15
78604 Rietheim-Weilheim                    Fax: +49 / 74 61 / 93 53 - 99

Re: Irritating column numbers with encoding=utf-8

Reply via email to