On 30/03/14 20:32, Yasuhiro MATSUMOTO wrote:
index(sprit("こんにちわ世界", "\zs"), "世") should return 5


Now this is interesting.

index() does indeed split on character, not byteboundaries. However, even if
I can do this:

    split("こんにちわ世界", '\zs')

to get this:

    ['こ', 'ん', 'に', 'ち', 'わ', '世', '界']

it still doesn't allow me to do a search for "世界" (i.e. a word) and get the
answer 5. Instead I have to break my search word into individual characters
and then perform a manual character by character comparison - in ViM script.
Absolutely no good for performance, especially if I'm processing big text files.

Incidentally, checking this yielded yet another inconsistency. The reverse of index() is the array subscript operator "[...]" which works directly on strings
to get a character. e.g.

                    1111
          01234567890123
    echo "this is a test"[5]

correctly yields "i". However, if I do this:

          0123456
    echo "こんにちわ世界"[5]

instead of getting "世" (6th character), it wrongly returns the 6th byte and
gives me "<93>", which I presume is a byte midway through a UTF-8 character
sequence.

This is not good. These inconsistencies need to be fixed.

--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Raspunde prin e-mail lui