Re: Issue in match() function with multi-byte characters

Andre Sihera Sun, 30 Mar 2014 06:12:29 -0700

On 30/03/14 20:32, Yasuhiro MATSUMOTO wrote:

index(sprit("こんにちわ世界", "\zs"), "世") should return 5

Now this is interesting.

index() does indeed split on character, not byteboundaries. However, even if
I can do this:

    split("こんにちわ世界", '\zs')

to get this:

    ['こ', 'ん', 'に', 'ち', 'わ', '世', '界']

it still doesn't allow me to do a search for "世界" (i.e. a word) andget the

answer 5. Instead I have to break my search word into individual characters
and then perform a manual character by character comparison - in ViM script.

Absolutely no good for performance, especially if I'm processing bigtext files.

Incidentally, checking this yielded yet another inconsistency. Thereverse ofindex() is the array subscript operator "[...]" which works directly onstrings

to get a character. e.g.

                    1111
          01234567890123
    echo "this is a test"[5]

correctly yields "i". However, if I do this:

          ０１２３４５６
    echo "こんにちわ世界"[5]

instead of getting "世" (6th character), it wrongly returns the 6th byte and
gives me "<93>", which I presume is a byte midway through a UTF-8 character
sequence.

This is not good. These inconsistencies need to be fixed.

--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---You received this message because you are subscribed to the Google Groups "vim_dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Issue in match() function with multi-byte characters

Raspunde prin e-mail lui