Re: E16 for invalid regex range is not intuitive

itchyny Tue, 16 May 2017 05:19:20 -0700

2017年5月16日火曜日 21時14分32秒 UTC+9 itc...@hatena.ne.jp:
> 2017年5月16日火曜日 14時36分04秒 UTC+9 Bram Moolenaar:
> > > 2017年5月11日木曜日 21時52分48秒 UTC+9 Bram Moolenaar:
> > > > Ken Hamada wrote:
> > > > 
> > > > > Hi list, I noticed that E16 occurs with regexp.
> > > > > 
> > > > > set re=1
> > > > > echo '' =~# '[\uff-\uf0]'
> > > > > E16: Invalid range
> > > > > echo '' =~# '[\u3000-\u4000]'
> > > > > E16: Invalid range
> > > > > 
> > > > > I think this is unintuitive error message because E16 reminds us the 
> > > > > range of
> > > > > commands (for example :0buffer). So I suggest two exclusive options:
> > > > > 
> > > > > - Add a note that the error can occur with regexp at :h E16.
> > > > > - Change the error message and number.
> > > > 
> > > > From the old days a bit of memory was saved by limiting the number of
> > > > error messages.  Now that we are more interested in useful errors,
> > > > splitting off specific cases is useful.
> > > > 
> > > > > BTW the behaviour *Limit to a range of 256 chars* differs due to the
> > > > > regex engine version.
> > > > > I could not tell the difference until I dived into the source code. I
> > > > > would like some warning to be written regarding this topic around :h
> > > > > /[.
> > > > 
> > > > I thought this was mentioned somewhere, but can't find it right now.
> > > > Please suggest an improvement.
> > > 
> > > Thank you Bram and Ken Tanaka.
> > > 
> > > I think creating a new error number will be better.
> > > Here's the patch. What do you think?
> > 
> > Now that we are at it, we can make it more specific: One error for the
> > reverse range and one for a too large range.
> > 
> > It's also nice to have a test for these.
> 
> Thank you Bram for your advice.
> 
> I updated the patch and added some tests.
> 
> diff --git a/runtime/doc/pattern.txt b/runtime/doc/pattern.txt
> index d6764096a..d487a3d65 100644
> --- a/runtime/doc/pattern.txt
> +++ b/runtime/doc/pattern.txt
> @@ -1075,13 +1075,16 @@ x     A single character, with no special meaning, 
> matches itself
>       `:substitute` command the whole command becomes the pattern.  E.g.
>       ":s/[/x/" searches for "[/x" and replaces it with nothing.  It does
>       not search for "[" and replaces it with "x"!
> -
> +                                                             *E944* *E945*
>       If the sequence begins with "^", it matches any single character NOT
>       in the collection: "[^xyz]" matches anything but 'x', 'y' and 'z'.
>       - If two characters in the sequence are separated by '-', this is
>         shorthand for the full list of ASCII characters between them.  E.g.,
> -       "[0-9]" matches any decimal digit.  Non-ASCII characters can be
> -       used, but the character values must not be more than 256 apart.
> +       "[0-9]" matches any decimal digit. If the starting character exceeds
> +       the ending character like [c-a], E944 occurs. Non-ASCII characters
> +       can be used, but the character values must not be more than 256 apart
> +       in the old regexp engine. For example, searching by [\u3000-\u4000]
> +       after setting re=1 emits E945 error. Prepending \%#=2 will fix it.
>       - A character class expression is evaluated to the set of characters
>         belonging to that character class.  The following character classes
>         are supported:
> diff --git a/runtime/doc/todo.txt b/runtime/doc/todo.txt
> index 0736ada58..3b9356073 100644
> --- a/runtime/doc/todo.txt
> +++ b/runtime/doc/todo.txt
> @@ -930,8 +930,6 @@ Patch to handle integer overflow. (Aaron Burrow, 2013 Dec 
> 12)
>  Patch to add "ntab" item in 'listchars' to repeat first character. (Nathaniel
>  Braun, pragm, 2013 Oct 13)  A better solution 2014 Mar 5.
>  
> -/[b-a] gives error E16, should probably be E769.
> -
>  7   Windows XP: When using "ClearType" for text smoothing, a column of yellow
>      pixels remains when typing spaces in front of a "D" ('guifont' set to
>      "lucida_console:h8").
> diff --git a/src/regexp.c b/src/regexp.c
> index e1f6484c0..93507a73a 100644
> --- a/src/regexp.c
> +++ b/src/regexp.c
> @@ -358,6 +358,8 @@ static char_u     *regprop(char_u *);
>  static int re_mult_next(char *what);
>  
>  static char_u e_missingbracket[] = N_("E769: Missing ] after %s[");
> +static char_u e_reversed_range[] = N_("E944: Reversed range in character 
> class");
> +static char_u e_large_class[] = N_("E945: Too large range in character 
> class");
>  static char_u e_unmatchedpp[] = N_("E53: Unmatched %s%%(");
>  static char_u e_unmatchedp[] = N_("E54: Unmatched %s(");
>  static char_u e_unmatchedpar[] = N_("E55: Unmatched %s)");
> @@ -2426,14 +2428,14 @@ collection:
>                               endc = coll_get_char();
>  
>                           if (startc > endc)
> -                             EMSG_RET_NULL(_(e_invrange));
> +                             EMSG_RET_NULL(_(e_reversed_range));
>  #ifdef FEAT_MBYTE
>                           if (has_mbyte && ((*mb_char2len)(startc) > 1
>                                                || (*mb_char2len)(endc) > 1))
>                           {
>                               /* Limit to a range of 256 chars */
>                               if (endc > startc + 256)
> -                                 EMSG_RET_NULL(_(e_invrange));
> +                                 EMSG_RET_NULL(_(e_large_class));
>                               while (++startc <= endc)
>                                   regmbc(startc);
>                           }
> diff --git a/src/regexp_nfa.c b/src/regexp_nfa.c
> index 120861a46..6df50f061 100644
> --- a/src/regexp_nfa.c
> +++ b/src/regexp_nfa.c
> @@ -1851,7 +1851,7 @@ collection:
>                       endc = startc;
>                       startc = oldstartc;
>                       if (startc > endc)
> -                         EMSG_RET_FAIL(_(e_invrange));
> +                         EMSG_RET_FAIL(_(e_reversed_range));
>  
>                       if (endc > startc + 2)
>                       {
> diff --git a/src/testdir/test_regexp_utf8.vim 
> b/src/testdir/test_regexp_utf8.vim
> index c54b65081..f38332071 100644
> --- a/src/testdir/test_regexp_utf8.vim
> +++ b/src/testdir/test_regexp_utf8.vim
> @@ -137,3 +137,18 @@ func Test_classes_re2()
>    call s:classes_test()
>    set re=0
>  endfunc
> +
> +func Test_reversed_range()
> +  for re in range(0, 2)
> +    exe 'set re=' . re
> +    call assert_fails('call match("abc def", "[c-a]")', 'E944:')
> +  endfor
> +endfunc
> +
> +func Test_large_class()
> +  set re=1
> +  call assert_fails('call match("abc def", "[\u3000-\u4000]")', 'E945:')
> +  set re=2
> +  call assert_equal(0, 'abc def' =~# '[\u3000-\u4000]')
> +  call assert_equal(1, "\u3042" =~# '[\u3000-\u4000]')
> +endfunc
> 
> 
> Ken Hamada


I think I should have add set re=0 at the end of Test_large_class().

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: E16 for invalid regex range is not intuitive

Raspunde prin e-mail lui