Hi, Bram Moolenaar wrote: > > Jürgen Krämer wrote: > >> with 'encoding' set to "utf-8" there is a quite confusing (to me) >> difference between the column number and my expectations (supported by >> the virtual column number) if there are non-ASCII characters on the >> line. I don't know what the intended meaning of "column count" and the >> intended behaviour of "cursor()" are, but it seems they both depend on >> the size of the encoded characters. I always thought "nth column" was >> more or less a synonym for "nth character on a line" while "nth virtual >> column" meant "nth cell on a screen line". >> [snipped >> >> I don't know whether the shown behaviour is a bug or just a feature I >> don't like, but in summary I think "column number" should really >> represent a character count (i.e, corresponding to what the user sees), >> not a byte count depending on the underlying encoding. >> >> I have seen this behaviour in VIM 6.2, 6.3, 6.4, and 7.0, so changing >> the code will definitely introduce an incompatibility. So the final >> question is: What do you (Vimmers) and you (Bram) think: is there a need >> for a change. > > I don't know why you call this a column count, in most places it's > called a byte count. Perhaps in some places in the docs the remark > about this actually being a byte count is missing.
sorry, the "column count" in the first paragraph should have been a "column number". I called it so because I have the statusline option set to %<%f%= [%1*%M%*%{','.&fileformat}%R%Y] [%6l,%4c%V] %3b=0x%02B %P and noticed that "%4c-%V" displayed two numbers instead of the one I expected, because I knew there were no tabs or unprintable characters on that line. Even more disturbing was the fact that the first number (the column number) was bigger than the second one (the virtual column number). So I checked ":help statusline" and it told me c N Column number. v N Virtual column number. V N Virtual column number as -{num}. Not displayed if equal to 'c'. > You could also want a character count. But what is a character when > using composing characters? E.g., when the umlaut is not included in > a character but added as a separate composing character? I would say that a character is what the user sees. Why should he (be forced to) know wheter "ä" is represented internally as LATIN SMALL LETTER A WITH DIAERESIS or as LATIN SMALL LETTER A plus COMBINING DIARESIS? So in my opinion "column count" is equivalent to "character count" unless there are characters like tabs and unprintable ones that have a special representation -- on the screen, not internally. > It's not so obvious what to do. In these situations I rather keep it as > it is. I know it's a big change and would introduce imcompatibiliy with older versions, but here is another example: Take this line (ignoring the leading spaces) ääbbcc and the following commands :s/\%3c../xx/ %s/^..\zs../xx/ >From my point of view they should both replace the 3rd and 4th column with "xx". When encoding is set to latin1 they do, but not when it is set to utf-8 -- the first one replaces "äb" with "xx". As a user I would be really stumbled and ask "Why that, it's the same text as before." Changing these commands to :s/\%2c../xx/ %s/^.\zs../xx/ makes things even more irritating. The second one works as expected, now correctly replacing "äb" with "xx", but the first one fails with "E486: Pattern not found: \%2c..". Again: Ought I (as a user) really need to know that \%2c depends on the number of non-ASCII letters in front of the column I'm interested in? Regards, Jürgen -- Jürgen Krämer Softwareentwicklung HABEL GmbH & Co. KG mailto:[EMAIL PROTECTED] Hinteres Öschle 2 Tel: +49 / 74 61 / 93 53 - 15 78604 Rietheim-Weilheim Fax: +49 / 74 61 / 93 53 - 99