Kazunobu Kuriyama wrote: > > > 2016-04-25 5:03 GMT+09:00 Bram Moolenaar <[email protected]>: > > > > I do not see a clue why this would be different on OS/X. > > > > > > As the failure message above indicates, it looks the functions isalpha(), > > > isalnum() and ispunct() of OS X accept a wider range of 8-bit characters > > as > > > class members. In other words, in contrast to Linux, these functions > > don't > > > assume the standard C locale to determine their behaviors. > > > > > > While Linux's man page talks about the C locale ( > > > http://linux.die.net/man/3/isalpha), OS X's man page doesn't mention > > about > > > it ( > > > > > https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/isalpha.3.html > > > ). > > > > > > Actually, when I ran the test like this: > > > > > > $ LC_CTYPE=C make test_alot_utf8 > > > > > > then the test succeeded. > > > > > > So, I feel we need to add something like this to test_regexp_utf8.vim > > > (please see the attached patch for details, because it contains a long > > > string): > > > > > > if has('osx') > > > lang ctype C > > > endif > > > > > > But I'd rather like to wait for a day or two for someone with a better > > > explanation and solution :-) > > > > Well, that may fix the test, but the regexp behavior will still differ > > between systems. I rather avoid that. Otherwise some plugins might > > break on OS/X (and lots of people won't have a chance to try it out). > > > > That's been also my concern and I've been looking for another solution > since then :) > > TL;DR. Hopefully, the attached patch fixes the issue. > > After sending my previous email, I made a small C program to mimic > test_regexp_utf8.vim and examined the behavior of those ctype functions --- > differences of the resulting character classes and their locale dependency. > > Having done that, I concluded that the test failure indeed came from > behavioral difference of the ctype functions, and found simple set theory > operations were enough to solve the problem. > > So I added some extra conditions to some of the `IF` statements in regexp.c > and regexp_nfa.c where isalpha(), islower(), isalnum() and ispunct() were > called, so that the resulting character classes would match what vim > expected. > > I tested the patch with some different locales, and confirmed it worked on > my os x. > > (If you want to examine the test program source code and it's raw output > data, just let me know. I'll send you them with another email.)
Thanks. I think we can leave out the #ifdef and use "< 128" instead of isascii(). It appears other programs say that what is matched depends on the locale. Although that can be useful, it's a nasty dependency, because the locale is global to the whole program. What if you have two files in a different locale? I don't think there is an isalpha() function that takes a locale argument. -- Save the plankton - eat a whale. /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net \\\ /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\ an exciting new programming language -- http://www.Zimbu.org /// \\\ help me help AIDS victims -- http://ICCF-Holland.org /// -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
