2017-03-16 8:40 GMT+09:00 Michal Grochmal <groch...@member.fsf.org>:

> On Wed, Mar 15, 2017 at 11:16:49PM +0100, Bram Moolenaar wrote:
> >
> > Kazunobu Kuriyama wrote:
> >
> > > > But it seems strange that we need to restrict [:cntrl:] and
> [:graph:] to ASCII only.
> > >
> > > Quite understandable.  But otherwise, we will have to either rely
> > > entirely on the is*() functions provided by the OS in use or define
> > > our own character classes independently of any of it.
> > >
> > > The former case implies that the behavior of Vim scripts using
> > > [:class:] depends on the OS in use.   Surely, the latter case is
> > > expected to resolve the flaw of the former, but I'm not sure we can
> > > specify character classes in such a way that almost all users on
> > > various platforms are satisfied with them.
> > >
> > > So, I think at the moment that the ASCII restriction is a reasonable
> > > compromise.  But I'm still quite open to other better solutions.
> >
> > It's a difficult choice.  Either we say the regexp should be portable,
> > and we let Vim define exactly what those classes mean, or we say we must
> > follow how the current system considers characters to be classified.  I
> > wonder when the system knows better, perhaps when something in the
> > system configuration, e.g. the country or language, changes what
> > characters mean?
>
> Yes, it does.  At least on glibc (i.e. GNU).  iswcntrl(3) is defined as any
> character that is *not* part of "print", "alpha", "upper", "lower",
> "digit",
> "xdigit", "punct".
>
> The problem is that "alpha" and "punct" are affected by locale settings,
> therefore "cntrl" is affected too.  In other words, with a simple regex Vim
> would likely either: classify all >0x80 characters as [:cntrl:] or none of
> them, which may be erroneous since some UTF-8 characters are not printable
> in
> the higher ranges.
>
> So, the restriction to ASCII values, or better to 0x00-0x00ff, makes sense.
> For example (using some UTF-8 aware terminal emulator and a UTF-8 locale):
>
> 1.   printf "\x00\xc0"  # will print an À
> 2.   printf "\x00\x9f"  # will give the same [:cntrl:] character that 0x9f
>                         # gives under LC_CTYPE=latin1
>
> iswgraph(3) also has a note that it depends on LC_CTYPE but the defintion
> on
> how this happens seems more convoluted.
>
> For non-UTF things should be simpler to regex I guess.
>
> Yet, still for UTF-8, different version of glibc do have different UTF-8
> tables.  And other systems may as well be more or less updated to the
> unicode
> consortium.  Other OSes may be more or less often updated too.
>
> I'd make a regex for all the ISO8859, KOI8 and EUC locales and leave
> the system to deal with the others.  Then, on *nix LC_TYPE=C should work
> like
> latin1 (iso8859-1) and on MS windows *I believe* that you can set the
> locale to
> latin1 on any version of it.
>
> Will not test all locales but there will be some tests at least.
>

Hi Michal,

Thank you for the comment.

After I read Bram's comment and yours, I spent a few hours looking into the
issue and skimming http://www.unicode.org/reports/tr18/, in particular,
http://www.unicode.org/reports/tr18/#Compatibility_Properties .

According to the table of Compatibility Property Names, it looks to me
that, as far as :cntrl: and :graph: are concerned, we can get around the
difficulties regarding LC_CTYPE and can implement ctype-like functions,
say, vim_iscntrl() and vim_isgraph(), for unicode code points in a way
closer to Bram's suggestion.

But I may be missing something.  So I'm anxious to hear your view on that.
Do you think it is feasible to implement ctype-like functions for :cntrl:
and :graph: if we follow the assignment recommendation of Annex C?

Kazunobu

>
> --
> Mike Grochmal
> GPG key ID 0xC840C4F6
>
> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups
> "vim_dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vim_dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Raspunde prin e-mail lui