Re: Understanding regxp implementation

Bram Moolenaar Thu, 22 Mar 2007 13:21:19 -0800

Nikolai Weibull wrote:

> On 3/22/07, Asiri Rathnayake <[EMAIL PROTECTED]> wrote:
> 
> > As you might know, the reg_comp() method is called twice when compiling
> > a r.e; first to determine the size of the compiled expression and then
> > to actually compile it. I was thinking if this can be used to our
> > advantage, while on the first pass, we look for occurrences of special
> > characters and set a flag in regprog_T appropriately. If such thing was
> > not found, we branch off the second pass into one of our own routines to
> > compile the expression into our own structures (say, a state diagram).
> > And we have to change other functions a bit to look for the above flag
> > and call new routines appropriately. What do you think ?
> 
> That sounds like a good way of determining whether the old engine will
> be required or if a new one (with more "limited" functionality) should
> be used.  One way of keeping this information as local as possible
> would be to keep a set of function pointers with the compiled regex
> that point to the appropriate functions to execute them on some input.
> 
> For example, you could have something like this:
> 
> typedef struct
> {
>     int (*exec)();
>     int                       regstart;
>     char_u            reganch;
>     char_u            *regmust;
>     int                       regmlen;
>     unsigned          regflags;
>     char_u            reghasz;
>     char_u            program[1];             /* actually longer.. */
> } regprog_T;
> 
> and change vim_regexec() to call the exec() function of the regprog_T
> in the regmatch_T that it gets passed.  You'd then set exec() to point
> to either vim_old_regexec() or vim_new_regexec() (or similarly named
> functions) in vim_regcomp() depending on the type of regex we have.  I
> guess it could be some flag field as well, but this makes it possible
> to add a third matcher, should we so desire...like a
> Boyer-Moore-Horspool matcher for fixed strings.


Adding a third matcher won't happen soon, and is a big change.  It's not
really needed to prepare for that.

The disadvantage of using a function pointer is that in the place where
it's used you only see:

         myprog->exec(args);

You can't see which function is being called, and finding out is not
that simple.  So when you browse the code this is like a dead end.

Using this keeps navigating much simpler:

        if (myprog->difficultregexp)
                regmatch_old(args);
        else
                regmatch_new(args);

One reason why inheritance in object oriented programming makes our life
more difficult is that you quite often don't know for sure which method
is invoked.

-- 
ROBIN:  The what?
ARTHUR: The Holy Hand Grenade of Antioch.  'Tis one of the sacred relics
        Brother Maynard always carries with him.
ALL:    Yes. Of course.
ARTHUR: (shouting) Bring up the Holy Hand Grenade!
                 "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\        download, build and distribute -- http://www.A-A-P.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

Re: Understanding regxp implementation

Reply via email to