On 4/3/07, Peter Hodge <[EMAIL PROTECTED]> wrote:

--- Yakov Lerner <[EMAIL PROTECTED]> wrote:

> I use sometimes the regex that finds paragraphs
> containing given words w1,w2,... in any order ( I define "paragraph"
> as separated by lines, \n\n).
>
> I use the pattern like this: (two-word example, w1 and w2, but easily
> expandable for N words):
>         /\c\(.\|.\n\)*\<w1\>\&\(.\|.\n\)*\<w2\>
>        (and I set :set maxmempattern=20000 )
> This works. But search time is unbelievably slow on big files.
>
> My question is; is there a rewrite of this  regex that works faster.
>
> To see the testcase how of how slow this works:
>    1. wget http://www.vmunix.com/~gabor/c/draft.html
>       # this is ~1.3 MB file.
>    2. vim draft.html
>    3. /\c\(.\|.\n\)*\<w1\>\&\(.\|.\n\)*\<w2\>

Try this pattern:

  /\c\n\zs\%(\%(.\|.\n\)\{-}\<international\>\&\%(.\|.\n\)\{-}\<although\>\)

It has the \n at the start so it will match at most once per line and uses \{-}
instead of * to prevent backtracking. That search ends in 30 seconds (on a Dual
1.8ghz G5).  You won't need to tweak maxmempattern either.

Thanks Peter,
This works great.
Yakov

Reply via email to