2016-04-26 6:28 GMT+09:00 Bram Moolenaar <[email protected]>: > > Kazunobu Kuriyama wrote: > > > 2016-04-25 5:03 GMT+09:00 Bram Moolenaar <[email protected]>: > > > I do not see a clue why this would be different on OS/X. > > > > As the failure message above indicates, it looks the functions isalpha(), > > isalnum() and ispunct() of OS X accept a wider range of 8-bit characters > as > > class members. In other words, in contrast to Linux, these functions > don't > > assume the standard C locale to determine their behaviors. > > > > While Linux's man page talks about the C locale ( > > http://linux.die.net/man/3/isalpha), OS X's man page doesn't mention > about > > it ( > > > https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/isalpha.3.html > > ). > > > > Actually, when I ran the test like this: > > > > $ LC_CTYPE=C make test_alot_utf8 > > > > then the test succeeded. > > > > So, I feel we need to add something like this to test_regexp_utf8.vim > > (please see the attached patch for details, because it contains a long > > string): > > > > if has('osx') > > lang ctype C > > endif > > > > But I'd rather like to wait for a day or two for someone with a better > > explanation and solution :-) > > Well, that may fix the test, but the regexp behavior will still differ > between systems. I rather avoid that. Otherwise some plugins might > break on OS/X (and lots of people won't have a chance to try it out). >
That's been also my concern and I've been looking for another solution since then :) TL;DR. Hopefully, the attached patch fixes the issue. After sending my previous email, I made a small C program to mimic test_regexp_utf8.vim and examined the behavior of those ctype functions --- differences of the resulting character classes and their locale dependency. Having done that, I concluded that the test failure indeed came from behavioral difference of the ctype functions, and found simple set theory operations were enough to solve the problem. So I added some extra conditions to some of the `IF` statements in regexp.c and regexp_nfa.c where isalpha(), islower(), isalnum() and ispunct() were called, so that the resulting character classes would match what vim expected. I tested the patch with some different locales, and confirmed it worked on my os x. (If you want to examine the test program source code and it's raw output data, just let me know. I'll send you them with another email.) > I wonder what generally the behavior of [[:alpha:]] is, include > non-ascii characters or not? > > I think it's OK to keep the current behavior. We can wait and see until we're sure whether or not the enhancement Apple did is actually beneficial to the users. I'd like to hear other thoughts, also. Best regards, Kazunobu Kuriyama > -- > ARTHUR: Ni! > BEDEVERE: Nu! > ARTHUR: No. Ni! More like this. "Ni"! > BEDEVERE: Ni, ni, ni! > "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES > LTD > > /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net > \\\ > /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ > \\\ > \\\ an exciting new programming language -- http://www.Zimbu.org > /// > \\\ help me help AIDS victims -- http://ICCF-Holland.org > /// > -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
osx-regexp-ctype.patch
Description: Binary data
