2017-03-16 8:40 GMT+09:00 Michal Grochmal <groch...@member.fsf.org>: > On Wed, Mar 15, 2017 at 11:16:49PM +0100, Bram Moolenaar wrote: > > > > Kazunobu Kuriyama wrote: > > > > > > But it seems strange that we need to restrict [:cntrl:] and > [:graph:] to ASCII only. > > > > > > Quite understandable. But otherwise, we will have to either rely > > > entirely on the is*() functions provided by the OS in use or define > > > our own character classes independently of any of it. > > > > > > The former case implies that the behavior of Vim scripts using > > > [:class:] depends on the OS in use. Surely, the latter case is > > > expected to resolve the flaw of the former, but I'm not sure we can > > > specify character classes in such a way that almost all users on > > > various platforms are satisfied with them. > > > > > > So, I think at the moment that the ASCII restriction is a reasonable > > > compromise. But I'm still quite open to other better solutions. > > > > It's a difficult choice. Either we say the regexp should be portable, > > and we let Vim define exactly what those classes mean, or we say we must > > follow how the current system considers characters to be classified. I > > wonder when the system knows better, perhaps when something in the > > system configuration, e.g. the country or language, changes what > > characters mean? > > Yes, it does. At least on glibc (i.e. GNU). iswcntrl(3) is defined as any > character that is *not* part of "print", "alpha", "upper", "lower", > "digit", > "xdigit", "punct". > > The problem is that "alpha" and "punct" are affected by locale settings, > therefore "cntrl" is affected too. In other words, with a simple regex Vim > would likely either: classify all >0x80 characters as [:cntrl:] or none of > them, which may be erroneous since some UTF-8 characters are not printable > in > the higher ranges. > > So, the restriction to ASCII values, or better to 0x00-0x00ff, makes sense. > For example (using some UTF-8 aware terminal emulator and a UTF-8 locale): > > 1. printf "\x00\xc0" # will print an À > 2. printf "\x00\x9f" # will give the same [:cntrl:] character that 0x9f > # gives under LC_CTYPE=latin1 > > iswgraph(3) also has a note that it depends on LC_CTYPE but the defintion > on > how this happens seems more convoluted. > > For non-UTF things should be simpler to regex I guess. > > Yet, still for UTF-8, different version of glibc do have different UTF-8 > tables. And other systems may as well be more or less updated to the > unicode > consortium. Other OSes may be more or less often updated too. > > I'd make a regex for all the ISO8859, KOI8 and EUC locales and leave > the system to deal with the others. Then, on *nix LC_TYPE=C should work > like > latin1 (iso8859-1) and on MS windows *I believe* that you can set the > locale to > latin1 on any version of it. > > Will not test all locales but there will be some tests at least. >
Hi Michal, Thank you for the comment. After I read Bram's comment and yours, I spent a few hours looking into the issue and skimming http://www.unicode.org/reports/tr18/, in particular, http://www.unicode.org/reports/tr18/#Compatibility_Properties . According to the table of Compatibility Property Names, it looks to me that, as far as :cntrl: and :graph: are concerned, we can get around the difficulties regarding LC_CTYPE and can implement ctype-like functions, say, vim_iscntrl() and vim_isgraph(), for unicode code points in a way closer to Bram's suggestion. But I may be missing something. So I'm anxious to hear your view on that. Do you think it is feasible to implement ctype-like functions for :cntrl: and :graph: if we follow the assignment recommendation of Annex C? Kazunobu > > -- > Mike Grochmal > GPG key ID 0xC840C4F6 > > -- > -- > You received this message from the "vim_dev" maillist. > Do not top-post! Type your reply below the text you are replying to. > For more information, visit http://www.vim.org/maillist.php > > --- > You received this message because you are subscribed to the Google Groups > "vim_dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to vim_dev+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.