Charles E Campbell Jr wrote:
> Hello!
>
> I've received a couple of requests about getting Align.vim to work with
> utf-8 characters. As an example, consider:
>
> let x='grĂ¼n'
> echo "strlen(x)=".strlen(x)
>
> Thus, strlen() returns 5, not 4 as one might (sometimes) expect. So, I
> tried a workaround:
>
> fun! Strlen(x)
> 1split
> enew
> call setline(1,a:x)
> let ret= virtcol("$") - 1
> bwipe!
> return ret
> endfun
>
> echo Strlen(x)
>
> now returns 4 (at the price of using interpreted code over built-in
> strlen()). So, is this the best that can be done?
> I'd prefer to have a built-in compiled function for this.
>
> Regards,
> Chip Campbell
It all depends on what exactly you want to do. (I haven't read the Align.vim
docs.) The length of a UTF-8 string can be counted in several nonequivalent
ways:
- number of bytes (Latin a + combining circumflex is three bytes):
strlen(string)
- number of codepoints (Latin a + combining circumflex is two codepoints):
strlen(substitute(string, '.', 'x', 'g'))
- number of spacing codepoints (Latin a + combining circumflex is one spacing
codepoint; a hard tab is one; wide and narrow CJK are one each; etc.):
(untested)
strlen(substitute(string, '.\Z', 'x', 'g'))
- virtual length (counting, for instance, tabs as anything between 1 and
'tabstop', wide CJK as 2 rather than 1, Arabic alif as zero when immediately
preceded by lam, one otherwise, etc.): I guess something like what you're
doing above will be necessary because of the wide range of things that can
happen.
The first two above are documented at ":help strlen()", the third (in
addition) at ":help patterns-composing".
Best regards,
Tony.
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---