On Mon, Nov 21, 2022 at 3:23 AM Bram Moolenaar <[email protected]> wrote:
>
>
> Yegappan wrote:
>
> > > > The language server protocol messages use character column number
> > > > whereas many of the built-in Vim functions (e.g. matchaddpos()) deal
> > > > with byte column number.
> > > >
> > > > Several built-in functions were added to convert between the character
> > > > and byte column numbers (byteidx(), charcol(), charidx(),
> > > > getcharpos(), getcursorcharpos(), etc,).
> > > > But these functions deal with strings, current cursor position or the
> > > > position of a mark.
> > > >
> > > > We currently don't have a function to return the byte number given the
> > > > character number in a line in a buffer.  The workaround is to use
> > > > getbufline() to get the entire buffer line and then use byteidx() to
> > > > get the byte number from the character number.
> > > >
> > > > I am thinking of introducing a new function named charcol2bytecol()
> > > > that accepts a buffer number, line number and the character number in
> > > > the line and returns the corresponding byte number.  Any
> > > > suggestions/comments on this?
> > > >
> > > > We should also modify the matchaddpos() function to accept
> > > > character numbers in a line in addition to the byte numbers.
> > >
> > > Just to make sure we understand what we are talking about: This is
> > > always about text in a buffer?  Thus the buffer text is somehow passed
> > > through the LSP to a server, which then returns information with
> > > character indexes.
> >
> > Yes.  The location information returned by the LSP server is about the
> > text in the buffer.
> >
> > > One detail that matters: Are composing characters counted separately, or
> > > not counted (part of the base character)?
> >
> > I think composing counters are not counted.  But I couldn't find this
> > mentioned in the LSP specification:
> >
> > https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#position
>
> Disappointing to not mention such an important part of the interface.
> Since I do not see any mention of composing characters, I would guess
> that each utf-8 character is counted separately.
>
> > > Also, I assume a Tab is counted as just one character, not the number of
> > > display cells it occupies.
> >
> > Yes. Tab is counted as one character.
> >
> > > I wonder if it's really helpful to add a new function if it can
> > > currently be done with two.  You already mention that the text can be
> > > obtained with getbufline(), and then get the byte index from the
> > > character index with byteidx().  What is the problem with doing it that
> > > way?
> >
> > If the conversion has to be done too many times then it is not efficient.
>
> How can you say that without trying?
>

I used the attached Vim9 script to measure the performance of
getbufline() + byteidx()
compared to calling the col() function.  I see that the first one
takes three times
longer to get the column number compared to the second one.

>
> Getting the buffer line means making a copy of the text, that's quite cheap.  
> The only
> added overhead is two function calls instead of one, which has really minimal
> impact in the context of all the other things being done.  Also, if there are
> multiple positions in one line then getbufline() only needs to be called
> once, thus performance should be very close to whatever function we
> would use instead.
>
> > > Other message:
> > >
> > > > Another alternative is to extend the col() function.  The col()
> > > > function currently accepts a list with two numbers (a line number and
> > > > a byte number or "$") and returns the byte number.
> > > > This can be modified to also accept a list with three numbers (line
> > > > number, column number and a boolean indicating character column or
> > > > byte column) and return the byte number.
> > >
> > > I don't like this, the first line for the col() help is:
> > >
> > >         The result is a Number, which is the byte index of the column
> > >
> > > When the boolean is true this would be the character index, that is hard
> > > to explain.  A user would have to look really hard to find this
> > > functionality.
> >
> > The boolean doesn't change the return value of the col() function.  It just
> > changes how the col() function interprets the column number in the list.
> > If it is true, then the col() function will use the column number as the
> > character number.  If it is false or not specified, then the col() function
> > will use it as the byte number.  In both cases the col() function will 
> > always
> > return the byte index of the column.
>
> I was confused.  Currently in the [lnum, col] value of {expr} the column
> is the character offset.
>

Currently in the [lnum, col] value of [expr], the column is the byte offset.
For example, if you use multibyte characters in a line and get the column
number:

=====================================================
new
call setline(1, "\u2345\u2346\u2347\u2348")
echo col([1, 3])
=====================================================

The above script echos 3 instead of 7.  The byte index of the third
character is 7.

Regards,
Yegappan

>
>  Since you are converting from character offset
> to byte index, I don't see how you would pass the byte index here, since
> you'll get the same byte index back.  What would be the point in passing
> [lnum, col, false] ?  BTW, leving out the flag must mean using the
> column number (for backwards compatibility).
>

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_dev/CAAW7x7kgo30XCnF5vEuNY-EKp0k2GbtyQ20%2BXGSL4rT01%2BYdJA%40mail.gmail.com.

Attachment: profile_col.vim
Description: Binary data

Raspunde prin e-mail lui