Jeremy Fairbrass wrote: > > Let's say I want to use regex to search for the phrase "color:blue" > within a <span> tag as in the example below (just a made-up example > for the sake of this question): > > <span style="border:0px; color:blue; font-size:small"> > > In this case, the "color:blue" part is preceeded by some other text > ("border:0px") after the first quote mark, but that preceeding text > could in fact be anything, and I want to allow for the fact that it > could be anything. > > I've read at http://www.regular-expressions.info that it's best to > avoid backtracking if possible because that is resource-intensive. > > So one possible solution would be the following: > > /style="(.(?!color))+.color:blue/
This seems to me to be very inefficient. At each point in the string it has to read forward to check for "color". > In other words, after the first " (quote mark) it looks for any > character NOT followed by the word "color", and repeats that with the > + character, until it gets to the actual word "color". I believe this > results in no (or almost no?) backtracking. But I'm not sure if it's > resource-intensive anyway, because of the negative lookahead - are > negative lookaheads particularly resource intensive, when compared to > backtracking? Is one preferable over the other? > > An alternative solution would be this: > > /style="[^>]+color:blue/ This looks better. It is probably less resource-intensive than your previous attempt and is definitely easier to read. But why are you looking for > when you anchor the beginning with a quote? How about this: /style="[^"]+?color:blue/ This is also non-greedy, so it will start looking for the "color:blue" match at the beginning of the string instead of having the + slurp up everything up to the quote and then backtracking to find the match. For SA purposes, you may want to limit the search as well. /style="[^"]{1,20}?color:blue/ This way, it will stop looking after 20 characters. This prevents it from using lots of memory if the quotes aren't closed. > But this will certainly involve some backtracking, especially if > there is even more text after the "color:blue" but before the > closing > character, for example the "font-size:small" text. > > So what do you think?! Which way is best, ie. most efficient or least > resource-intensive? -- Bowie