On 4/3/07, Peter Hodge <[EMAIL PROTECTED]> wrote:
--- Yakov Lerner <[EMAIL PROTECTED]> wrote: > I use sometimes the regex that finds paragraphs > containing given words w1,w2,... in any order ( I define "paragraph" > as separated by lines, \n\n). > > I use the pattern like this: (two-word example, w1 and w2, but easily > expandable for N words): > /\c\(.\|.\n\)*\<w1\>\&\(.\|.\n\)*\<w2\> > (and I set :set maxmempattern=20000 ) > This works. But search time is unbelievably slow on big files. > > My question is; is there a rewrite of this regex that works faster. > > To see the testcase how of how slow this works: > 1. wget http://www.vmunix.com/~gabor/c/draft.html > # this is ~1.3 MB file. > 2. vim draft.html > 3. /\c\(.\|.\n\)*\<w1\>\&\(.\|.\n\)*\<w2\> Try this pattern: /\c\n\zs\%(\%(.\|.\n\)\{-}\<international\>\&\%(.\|.\n\)\{-}\<although\>\) It has the \n at the start so it will match at most once per line and uses \{-} instead of * to prevent backtracking. That search ends in 30 seconds (on a Dual 1.8ghz G5). You won't need to tweak maxmempattern either.
Thanks Peter, This works great. Yakov