Charles E Campbell Jr wrote:
> Hello!
> 
> I've received a couple of requests about getting Align.vim to work with 
> utf-8 characters.  As an example, consider:
> 
> let x='grĂ¼n'
> echo "strlen(x)=".strlen(x)
> 
> Thus, strlen() returns 5, not 4 as one might (sometimes) expect.  So, I 
> tried a workaround:
> 
> fun! Strlen(x)
>   1split
>   enew
>   call setline(1,a:x)
>   let ret= virtcol("$") - 1
>   bwipe!
>   return ret
> endfun
> 
> echo Strlen(x)
> 
> now returns 4 (at the price of using interpreted code over built-in 
> strlen()).  So, is this the best that can be done?
> I'd prefer to have a built-in compiled function for this.
> 
> Regards,
> Chip Campbell

It all depends on what exactly you want to do. (I haven't read the Align.vim 
docs.) The length of a UTF-8 string can be counted in several nonequivalent 
ways:

- number of bytes (Latin a + combining circumflex is three bytes):
        strlen(string)

- number of codepoints (Latin a + combining circumflex is two codepoints):
        strlen(substitute(string, '.', 'x', 'g'))

- number of spacing codepoints (Latin a + combining circumflex is one spacing 
codepoint; a hard tab is one; wide and narrow CJK are one each; etc.): 
(untested)
        strlen(substitute(string, '.\Z', 'x', 'g'))

- virtual length (counting, for instance, tabs as anything between 1 and 
'tabstop', wide CJK as 2 rather than 1, Arabic alif as zero when immediately 
preceded by lam, one otherwise, etc.): I guess something like what you're 
doing above will be necessary because of the wide range of things that can 
happen.

The first two above are documented at ":help strlen()", the third (in 
addition) at ":help patterns-composing".


Best regards,
Tony.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply via email to