On Mon, Nov 21, 2022 at 3:23 AM Bram Moolenaar <[email protected]> wrote: > > > Yegappan wrote: > > > > > The language server protocol messages use character column number > > > > whereas many of the built-in Vim functions (e.g. matchaddpos()) deal > > > > with byte column number. > > > > > > > > Several built-in functions were added to convert between the character > > > > and byte column numbers (byteidx(), charcol(), charidx(), > > > > getcharpos(), getcursorcharpos(), etc,). > > > > But these functions deal with strings, current cursor position or the > > > > position of a mark. > > > > > > > > We currently don't have a function to return the byte number given the > > > > character number in a line in a buffer. The workaround is to use > > > > getbufline() to get the entire buffer line and then use byteidx() to > > > > get the byte number from the character number. > > > > > > > > I am thinking of introducing a new function named charcol2bytecol() > > > > that accepts a buffer number, line number and the character number in > > > > the line and returns the corresponding byte number. Any > > > > suggestions/comments on this? > > > > > > > > We should also modify the matchaddpos() function to accept > > > > character numbers in a line in addition to the byte numbers. > > > > > > Just to make sure we understand what we are talking about: This is > > > always about text in a buffer? Thus the buffer text is somehow passed > > > through the LSP to a server, which then returns information with > > > character indexes. > > > > Yes. The location information returned by the LSP server is about the > > text in the buffer. > > > > > One detail that matters: Are composing characters counted separately, or > > > not counted (part of the base character)? > > > > I think composing counters are not counted. But I couldn't find this > > mentioned in the LSP specification: > > > > https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#position > > Disappointing to not mention such an important part of the interface. > Since I do not see any mention of composing characters, I would guess > that each utf-8 character is counted separately. > > > > Also, I assume a Tab is counted as just one character, not the number of > > > display cells it occupies. > > > > Yes. Tab is counted as one character. > > > > > I wonder if it's really helpful to add a new function if it can > > > currently be done with two. You already mention that the text can be > > > obtained with getbufline(), and then get the byte index from the > > > character index with byteidx(). What is the problem with doing it that > > > way? > > > > If the conversion has to be done too many times then it is not efficient. > > How can you say that without trying? >
I used the attached Vim9 script to measure the performance of
getbufline() + byteidx()
compared to calling the col() function. I see that the first one
takes three times
longer to get the column number compared to the second one.
>
> Getting the buffer line means making a copy of the text, that's quite cheap.
> The only
> added overhead is two function calls instead of one, which has really minimal
> impact in the context of all the other things being done. Also, if there are
> multiple positions in one line then getbufline() only needs to be called
> once, thus performance should be very close to whatever function we
> would use instead.
>
> > > Other message:
> > >
> > > > Another alternative is to extend the col() function. The col()
> > > > function currently accepts a list with two numbers (a line number and
> > > > a byte number or "$") and returns the byte number.
> > > > This can be modified to also accept a list with three numbers (line
> > > > number, column number and a boolean indicating character column or
> > > > byte column) and return the byte number.
> > >
> > > I don't like this, the first line for the col() help is:
> > >
> > > The result is a Number, which is the byte index of the column
> > >
> > > When the boolean is true this would be the character index, that is hard
> > > to explain. A user would have to look really hard to find this
> > > functionality.
> >
> > The boolean doesn't change the return value of the col() function. It just
> > changes how the col() function interprets the column number in the list.
> > If it is true, then the col() function will use the column number as the
> > character number. If it is false or not specified, then the col() function
> > will use it as the byte number. In both cases the col() function will
> > always
> > return the byte index of the column.
>
> I was confused. Currently in the [lnum, col] value of {expr} the column
> is the character offset.
>
Currently in the [lnum, col] value of [expr], the column is the byte offset.
For example, if you use multibyte characters in a line and get the column
number:
=====================================================
new
call setline(1, "\u2345\u2346\u2347\u2348")
echo col([1, 3])
=====================================================
The above script echos 3 instead of 7. The byte index of the third
character is 7.
Regards,
Yegappan
>
> Since you are converting from character offset
> to byte index, I don't see how you would pass the byte index here, since
> you'll get the same byte index back. What would be the point in passing
> [lnum, col, false] ? BTW, leving out the flag must mean using the
> column number (for backwards compatibility).
>
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/vim_dev/CAAW7x7kgo30XCnF5vEuNY-EKp0k2GbtyQ20%2BXGSL4rT01%2BYdJA%40mail.gmail.com.
profile_col.vim
Description: Binary data
