> -----Original Message----- > From: Jürgen Krämer [mailto:[EMAIL PROTECTED] > Sent: 06 July 2006 08:01 > To: vim mailing list > Subject: Re: Irritating column numbers with encoding=utf-8 > > > Hi, > > Bram Moolenaar wrote: > > > > Jürgen Krämer wrote: > > > >> with 'encoding' set to "utf-8" there is a quite confusing (to me) > >> difference between the column number and my expectations > (supported by > >> the virtual column number) if there are non-ASCII characters on the > >> line. I don't know what the intended meaning of "column > count" and the > >> intended behaviour of "cursor()" are, but it seems they > both depend on > >> the size of the encoded characters. I always thought "nth > column" was > >> more or less a synonym for "nth character on a line" while > "nth virtual > >> column" meant "nth cell on a screen line". > >> > [snipped > >> > >> I don't know whether the shown behaviour is a bug or just > a feature I > >> don't like, but in summary I think "column number" should really > >> represent a character count (i.e, corresponding to what > the user sees), > >> not a byte count depending on the underlying encoding. > >> > >> I have seen this behaviour in VIM 6.2, 6.3, 6.4, and 7.0, > so changing > >> the code will definitely introduce an incompatibility. So the final > >> question is: What do you (Vimmers) and you (Bram) think: > is there a need > >> for a change. > > > > I don't know why you call this a column count, in most places it's > > called a byte count. Perhaps in some places in the docs the remark > > about this actually being a byte count is missing. > > sorry, the "column count" in the first paragraph should have been a > "column number". I called it so because I have the statusline > option set > to > > %<%f%= [%1*%M%*%{','.&fileformat}%R%Y] [%6l,%4c%V] %3b=0x%02B %P > > and noticed that "%4c-%V" displayed two numbers instead of the one I > expected, because I knew there were no tabs or unprintable characters > on that line. Even more disturbing was the fact that the first number > (the column number) was bigger than the second one (the virtual column > number). So I checked ":help statusline" and it told me > > c N Column number. > v N Virtual column number. > V N Virtual column number as -{num}. Not displayed > if equal to 'c'. > > > You could also want a character count. But what is a character when > > using composing characters? E.g., when the umlaut is not > included in > > a character but added as a separate composing character? > > I would say that a character is what the user sees. Why should he (be > forced to) know wheter "ä" is represented internally as LATIN SMALL > LETTER A WITH DIAERESIS or as LATIN SMALL LETTER A plus COMBINING > DIARESIS? So in my opinion "column count" is equivalent to "character > count" unless there are characters like tabs and unprintable ones that > have a special representation -- on the screen, not internally. > > > It's not so obvious what to do. In these situations I > rather keep it as > > it is. > > I know it's a big change and would introduce imcompatibiliy with older > versions, but here is another example: Take this line (ignoring the > leading spaces) > > ääbbcc > > and the following commands > > :s/\%3c../xx/ > %s/^..\zs../xx/ > > From my point of view they should both replace the 3rd and 4th column > with "xx". When encoding is set to latin1 they do, but not when it is > set to utf-8 -- the first one replaces "äb" with "xx". As a > user I would > be really stumbled and ask "Why that, it's the same text as before." > > Changing these commands to > > :s/\%2c../xx/ > %s/^.\zs../xx/ > > makes things even more irritating. The second one works as > expected, now > correctly replacing "äb" with "xx", but the first one fails > with "E486: > Pattern not found: \%2c..". Again: Ought I (as a user) really need to > know that \%2c depends on the number of non-ASCII letters in front of > the column I'm interested in?
Yes, this is indeed very unexpected IMHO and as you say mighty irritating. I find it very hard to disagree with your arguments. This should be changed IMHO, even if it surely is a big change. ---Zdenek