Re: line2byte() returns wrong result at multi-byte characters

mattn Mon, 19 Dec 2011 09:30:01 -0800

No, line2byte should return byte count of line in specified `BUFFER`. 
buffer may not be saved as a file. If you want to get byte count encoded as 
non-utf8, you can use iconv().


  iconv(line2byte("."), &encoding, "encoding-you-want")

For example

  iconv(line2byte("."), &encoding, &fileencoding)

And If you want to get bytes encoded in locale charsets,

  iconv(line2byte("."), &encoding, "char")

"char" works as "locale" in libiconv.

Please don't change the behavior of APIs already exists. If you want 
difference behavior of line2byte, please suggest new another function. ex 
line2byte().


On Monday, December 19, 2011 11:15:28 PM UTC+9, Дмитрий Франк wrote:
>
> This could be great if line2byte() is able to return file offset instead 
> of internal offset (optional flag seems like good solution).
>
> Regards,
> Dmitry.
>
> 19 декабря 2011 г. 18:04 пользователь Ingo Karkat 
> <sw...@ingo-karkat.de>написал:
>
>> On 19-Dec-2011 14:40, Дмитрий Франк wrote:
>>
>> > 19 декабря 2011 г. 17:03 пользователь Ingo Karkat <sw...@ingo-karkat.de
>> > <mailto:sw...@ingo-karkat.de>>написал:
>> >
>> >     On 19-Dec-2011 13:35, Дмитрий Франк wrote:
>> >
>> >     > Citation from help: "Return the *byte count* from the start of 
>> the buffer for
>> >     > line {lnum}"
>> >     >
>> >     > Returned *byte count* is wrong. It returns character count 
>> instead of
>> >     > byte count.
>> >
>> >     I cannot reproduce this, neither with Vim 7.3.0 on Windows/x64, nor 
>> with Vim
>> >     7.3.353 on Linux/x86:
>> >
>> >     $ vim -N -u NONE --cmd "set enc=utf-8" -c "call setline(1, 
>> ['foobaN', ''])" -c
>> >     "2|echo line2byte('.')"
>> >     8
>> >     $ vim -N -u NONE --cmd "set enc=utf-8" -c "call setline(1,
>> >     ['fooba'.nr2char(1049), ''])" -c "2|echo line2byte('.')"
>> >     9
>> >
>> >     Please post your Vim version, and steps to reproduce.
>> >
>> >     -- regards, ingo
>> >
>> >     PS: Please bottom-post on vim_dev.
>> >
>> >
>> > i use Windows, and i have to keep Vim's encoding cp1251. (standard 
>> encoding for
>> > russian Windows)
>> >
>> > Vim 7.3.46 on Windows/x86
>> >
>> > $ vim -N -u NONE --cmd "set enc=cp1251 | set fenc=utf-8 | set ff=unix" 
>> -c "call
>> > setline(1,['foobaN', ''])" -c "2|echo line2byte('.')"
>> > 8
>> > $ vim -N -u NONE --cmd "set enc=cp1251 | set fenc=utf-8 | set ff=unix" 
>> -c "call
>> > setline(1,['fooba'.nr2char(1049), ''])" -c "2|echo line2byte('.')"
>> > 8
>> >
>> > seems like line2byte() looks on the &encoding , but it should look on 
>> the
>> > &fileencoding .
>>
>> Your analysis looks right, and probably doesn't surprise the devs, because
>> internally Vim always uses 'encoding' to represent the buffer (and only 
>> converts
>> to 'fileencoding' during writes).
>>
>> This raises the question how line2byte() (and go/:goto commands) should 
>> behave.
>> I would side with you, using byte counts of the file, not the internal
>> representation (especially because for all Unicode encodings, Vim 
>> internally
>> uses UTF-8, so it wouldn't be possible to jump to UTF-16 / UTF-32 offsets 
>> even
>> by setting 'encoding' to it).
>>
>> :help line2byte() indirectly supports this (*emphasis* mine):
>> > This can also be used to get the byte count for the line just
>> > below the last line: >
>> >       line2byte(line("$") + 1)
>> > *This is the file size plus one.*
>>
>> I think this issue needs at least a note in the documentation, and I 
>> wonder
>> whether it's feasible to implement in the way you suggest. (For maximum
>> flexibility, line2byte() could take an optional flag whether file- or
>> internal-offset is wanted.)
>>
>> -- regards, ingo
>>
>> --
>> You received this message from the "vim_dev" maillist.
>> Do not top-post! Type your reply below the text you are replying to.
>> For more information, visit http://www.vim.org/maillist.php
>>
>
>

-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: line2byte() returns wrong result at multi-byte characters

Reply via email to