On 30/03/14 20:32, Yasuhiro MATSUMOTO wrote:
index(sprit("こんにちわ世界", "\zs"), "世") should return 5
Now this is interesting.
index() does indeed split on character, not byteboundaries. However, even if
I can do this:
split("こんにちわ世界", '\zs')
to get this:
['こ', 'ん', 'に', 'ち', 'わ', '世', '界']
it still doesn't allow me to do a search for "世界" (i.e. a word) and
get the
answer 5. Instead I have to break my search word into individual characters
and then perform a manual character by character comparison - in ViM script.
Absolutely no good for performance, especially if I'm processing big
text files.
Incidentally, checking this yielded yet another inconsistency. The
reverse of
index() is the array subscript operator "[...]" which works directly on
strings
to get a character. e.g.
1111
01234567890123
echo "this is a test"[5]
correctly yields "i". However, if I do this:
0123456
echo "こんにちわ世界"[5]
instead of getting "世" (6th character), it wrongly returns the 6th byte and
gives me "<93>", which I presume is a byte midway through a UTF-8 character
sequence.
This is not good. These inconsistencies need to be fixed.
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.