Yegappan wrote:
> @yegappan pushed 2 commits. > > 87c7f0f888bd61604659930276973374dc408e92 Add the utf16idx() function > and add UTF-16 flag to the byteidx() and byteidxcomp() functions > 84147e31e7f05403bfaab20ccb7689c74a87befb Add support for converting > from byte or character index in a string to UTF-16 index and vice > versa This looks like the right way to do this, but I find the help a bit difficult to interpret. I hope others, especially those who want to use the functionality, have a good look and make comments if something is missing or unclear. For byteidx() there is an extra argument, which, when TRUE, makes the {nr} argument used differently: When {utf16} is TRUE, {nr} is used as the UTF-16 index in the String {expr} instead of as the character index. The first thing that is unclear: what is "the UTF-16 index"? In the context of the discussion we had I can understand it is the index in the string when it is encoded with UTF-16, thus with 16 bit words. This should be explained better. I do not expect many to understand what UTF-16 encoding means. The examples are supposed to help understand this: echo byteidx('a😊😊', 2) returns 5 echo byteidx('a😊😊', 2, 1) returns 1 However, this raises questions: why does the second call return 1? For the first call I can compute the result: when {nr} is 2 then the index of the third character is returned, thus the bytes of the first two characters are added together. These are 1 and 4, total 5. You can see the second character is 4 bytes by using "g8" on it. With the second call the second character would take two UTF-16 words. With {nr} being 2 we refer to the third UTF-16 word, thus halfway the second character. This is apparently rounded down and only the one byte for "a" is counted. This rounding down is new, it should be explained. Perhaps adding this explanation of how the two examples work is sufficient. But it would be good to add a third call that is more likely to happen: echo byteidx('a😊😊', 3, 1) returns 5 This refers to the same character as the first call, thus has the same return value. This also makes clear (esp. for those who don't know UTF-16 well) that a character can consist of two words. For charidx() there is this example: echo charidx('a😊😊', 4, 0, 1) returns 3 I would think that index 4 is halfway the third character, thus I would expect a return value of 2. Am I wrong? -- "I know that there are people who don't love their fellow man, and I hate those people!" - Tom Lehrer /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net \\\ /// \\\ \\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ /// \\\ help me help AIDS victims -- http://ICCF-Holland.org /// -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/vim_dev/20230420183849.4F46F1C0782%40moolenaar.net.