Re: E16 for invalid regex range is not intuitive

itchyny Mon, 15 May 2017 05:28:06 -0700

2017年5月11日木曜日 21時52分48秒 UTC+9 Bram Moolenaar:
> Ken Hamada wrote:
> 
> > Hi list, I noticed that E16 occurs with regexp.
> > 
> > set re=1
> > echo '' =~# '[\uff-\uf0]'
> > E16: Invalid range
> > echo '' =~# '[\u3000-\u4000]'
> > E16: Invalid range
> > 
> > I think this is unintuitive error message because E16 reminds us the range 
> > of
> > commands (for example :0buffer). So I suggest two exclusive options:
> > 
> > - Add a note that the error can occur with regexp at :h E16.
> > - Change the error message and number.
> 
> From the old days a bit of memory was saved by limiting the number of
> error messages.  Now that we are more interested in useful errors,
> splitting off specific cases is useful.
> 
> > BTW the behaviour *Limit to a range of 256 chars* differs due to the
> > regex engine version.
> > I could not tell the difference until I dived into the source code. I
> > would like some warning to be written regarding this topic around :h
> > /[.
> 
> I thought this was mentioned somewhere, but can't find it right now.
> Please suggest an improvement.


Thank you Bram and Ken Tanaka.

I think creating a new error number will be better.
Here's the patch. What do you think?

diff --git a/runtime/doc/pattern.txt b/runtime/doc/pattern.txt
index d6764096a..6b7ee46c6 100644
--- a/runtime/doc/pattern.txt
+++ b/runtime/doc/pattern.txt
@@ -1075,13 +1075,15 @@ x       A single character, with no special meaning, 
matches itself
        `:substitute` command the whole command becomes the pattern.  E.g.
        ":s/[/x/" searches for "[/x" and replaces it with nothing.  It does
        not search for "[" and replaces it with "x"!
-
+                                                               *E944*
        If the sequence begins with "^", it matches any single character NOT
        in the collection: "[^xyz]" matches anything but 'x', 'y' and 'z'.
        - If two characters in the sequence are separated by '-', this is
          shorthand for the full list of ASCII characters between them.  E.g.,
-         "[0-9]" matches any decimal digit.  Non-ASCII characters can be
-         used, but the character values must not be more than 256 apart.
+         "[0-9]" matches any decimal digit. If the starting character exceeds
+         the ending character like [c-a], E769 occurs. Non-ASCII characters
+         can be used, but the character values must not be more than 256 apart
+         in the old regexp engine.
        - A character class expression is evaluated to the set of characters
          belonging to that character class.  The following character classes
          are supported:
diff --git a/runtime/doc/todo.txt b/runtime/doc/todo.txt
index 0736ada58..3b9356073 100644
--- a/runtime/doc/todo.txt
+++ b/runtime/doc/todo.txt
@@ -930,8 +930,6 @@ Patch to handle integer overflow. (Aaron Burrow, 2013 Dec 
12)
 Patch to add "ntab" item in 'listchars' to repeat first character. (Nathaniel
 Braun, pragm, 2013 Oct 13)  A better solution 2014 Mar 5.
 
-/[b-a] gives error E16, should probably be E769.
-
 7   Windows XP: When using "ClearType" for text smoothing, a column of yellow
     pixels remains when typing spaces in front of a "D" ('guifont' set to
     "lucida_console:h8").
diff --git a/src/regexp.c b/src/regexp.c
index e1f6484c0..8574fc6f2 100644
--- a/src/regexp.c
+++ b/src/regexp.c
@@ -358,6 +358,7 @@ static char_u       *regprop(char_u *);
 static int re_mult_next(char *what);
 
 static char_u e_missingbracket[] = N_("E769: Missing ] after %s[");
+static char_u e_invalidclass[] = N_("E944: Invalid character class");
 static char_u e_unmatchedpp[] = N_("E53: Unmatched %s%%(");
 static char_u e_unmatchedp[] = N_("E54: Unmatched %s(");
 static char_u e_unmatchedpar[] = N_("E55: Unmatched %s)");
@@ -2426,14 +2427,14 @@ collection:
                                endc = coll_get_char();
 
                            if (startc > endc)
-                               EMSG_RET_NULL(_(e_invrange));
+                               EMSG_RET_NULL(_(e_invalidclass));
 #ifdef FEAT_MBYTE
                            if (has_mbyte && ((*mb_char2len)(startc) > 1
                                                 || (*mb_char2len)(endc) > 1))
                            {
                                /* Limit to a range of 256 chars */
                                if (endc > startc + 256)
-                                   EMSG_RET_NULL(_(e_invrange));
+                                   EMSG_RET_NULL(_(e_invalidclass));
                                while (++startc <= endc)
                                    regmbc(startc);
                            }
diff --git a/src/regexp_nfa.c b/src/regexp_nfa.c
index 120861a46..65df8ca78 100644
--- a/src/regexp_nfa.c
+++ b/src/regexp_nfa.c
@@ -1851,7 +1851,7 @@ collection:
                        endc = startc;
                        startc = oldstartc;
                        if (startc > endc)
-                           EMSG_RET_FAIL(_(e_invrange));
+                           EMSG_RET_FAIL(_(e_invalidclass));
 
                        if (endc > startc + 2)
                        {


Ken Hamada

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: E16 for invalid regex range is not intuitive

Raspunde prin e-mail lui