Jeremy Fairbrass wrote:
> 
> Let's say I want to use regex to search for the phrase "color:blue"
> within a <span> tag as in the example below (just a made-up example
> for the sake of this question):
> 
> <span style="border:0px; color:blue; font-size:small">
> 
> In this case, the "color:blue" part is preceeded by some other text
> ("border:0px") after the first quote mark, but that preceeding text
> could in fact be anything, and I want to allow for the fact that it
> could be anything.
> 
> I've read at http://www.regular-expressions.info that it's best to
> avoid backtracking if possible because that is resource-intensive.
> 
> So one possible solution would be the following:
> 
> /style="(.(?!color))+.color:blue/

This seems to me to be very inefficient.  At each point in the string
it has to read forward to check for "color".

> In other words, after the first " (quote mark) it looks for any
> character NOT followed by the word "color", and repeats that with the
> + character, until it gets to the actual word "color". I believe this
> results in no (or almost no?) backtracking. But I'm not sure if it's
> resource-intensive anyway, because of the negative lookahead - are
> negative lookaheads particularly resource intensive, when compared to
> backtracking? Is one preferable over the other?
> 
> An alternative solution would be this:
> 
> /style="[^>]+color:blue/

This looks better.  It is probably less resource-intensive than your
previous attempt and is definitely easier to read.  But why are you
looking for > when you anchor the beginning with a quote?

How about this:

    /style="[^"]+?color:blue/

This is also non-greedy, so it will start looking for the "color:blue"
match at the beginning of the string instead of having the + slurp up
everything up to the quote and then backtracking to find the match.

For SA purposes, you may want to limit the search as well.

    /style="[^"]{1,20}?color:blue/

This way, it will stop looking after 20 characters.  This prevents it
from using lots of memory if the quotes aren't closed.

> But this will certainly involve some backtracking, especially if
> there is even more text after the "color:blue" but before the
> closing > character, for example the "font-size:small" text.
> 
> So what do you think?! Which way is best, ie. most efficient or least
> resource-intensive?

-- 
Bowie

Reply via email to