Re: case of very slow regex search

Peter Hodge Tue, 03 Apr 2007 16:49:39 -0700

--- Yakov Lerner <[EMAIL PROTECTED]> wrote:

> I use sometimes the regex that finds paragraphs
> containing given words w1,w2,... in any order ( I define "paragraph"
> as separated by lines, \n\n).
> 
> I use the pattern like this: (two-word example, w1 and w2, but easily
> expandable for N words):
>         /\c\(.\|.\n\)*\<w1\>\&\(.\|.\n\)*\<w2\>
>        (and I set :set maxmempattern=20000 )
> This works. But search time is unbelievably slow on big files.
> 
> My question is; is there a rewrite of this  regex that works faster.
> 
> To see the testcase how of how slow this works:
>    1. wget http://www.vmunix.com/~gabor/c/draft.html
>       # this is ~1.3 MB file.
>    2. vim draft.html
>    3. /\c\(.\|.\n\)*\<w1\>\&\(.\|.\n\)*\<w2\>


Try this pattern:

  /\c\n\zs\%(\%(.\|.\n\)\{-}\<international\>\&\%(.\|.\n\)\{-}\<although\>\)

It has the \n at the start so it will match at most once per line and uses \{-}
instead of * to prevent backtracking. That search ends in 30 seconds (on a Dual
1.8ghz G5).  You won't need to tweak maxmempattern either.

regards,
Peter


Send instant messages to your online friends http://au.messenger.yahoo.com

Re: case of very slow regex search

Reply via email to