Re: Google Summer of Code 2007 : Improve regexp performance

Nikolai Weibull Fri, 16 Mar 2007 12:24:39 -0800

On 3/16/07, Bram Moolenaar <[EMAIL PROTECTED]> wrote:

Nikolai Weibull wrote:

> I actually wrote a simplification of his library, removing the
> approximate matching stuff, as part of my master's, which is well
> documented.  I still haven't had time to put up the PDF, though.

Interesting.  Although when the documentation is not available I would
call it undocumented :-).

> Anyway, it would take an immense amount of work to turn Vim onto a new
> regex implementation.  Vim has a whole range of its own stuff, like
> matching cursor positions and so on, and is tightly bound to the
> buffer implementation with its memlines and whatnot.  Not to
> dishearten you, but I don't think this is a project that can be
> completed over a summer (not that it has to be, but you may want to
> keep that in mind).

The idea is that, when compiling the regexp, you check for special items
that are not supported by the new/fast code.  If any is found, fall back
to the old/slow code.  This way you can start with something simple and
add more features over time.  Most patterns don't use back references or
match with the cursor position, thus the new/fast code would be used in
most cases.


Yes, that's a good course of action, and is what most regex libraries
should be doing in my opinion.  It's what Tcl's does, which is,
coincidentally, also written by Henry Spencer (the same guy who wrote
the initial regex implementation that Vim uses).

Matching with text in a Vim buffer is required, can't do much without
it.


No, but what I meant was that most regex libraries expect a char * as
input.  As you yourself has stated, adding support for multiline
matching in the current implementation was nontrivial.  However, that
may mostly be due to backtracking issues, and perhaps it may be easier
to interface with some other implementation.

 nikolai

Re: Google Summer of Code 2007 : Improve regexp performance

Reply via email to