On 2/18/2014 1:26 PM, Marc Perkel wrote:
On 2/18/2014 9:32 AM, John Hardin wrote:
On Tue, 18 Feb 2014, Marc Perkel wrote:
Trying to do something complex and not sure how it's done. What I'm
looking for is to combine 2 conditions in a single regular expression
so that both have to be true for a match. Yes - I know I can make 2
SA rules and combine them but I bet there's a way to do it in one
expression. For simplicity here's the challenge.
A chuck of text has to include the word "cat" 5 time and the word
"dog" 4 times to be a match. How do you do that?
I assume there must be no restrictions on the order the occurrences
appear? That makes it rather difficult, and thus expensive.
Two individual simple "tflags multiple maxhits=N" REs and a meta to
combine them would be much more efficient than a single RE. Is this
just an intellectual exercise, and/or something not limited to the SA
environment?
Yes - no order - it is expensive - don't care. Need to be a single regex.
Try this:
(?=(?:.*?\bcat\b){5}).*?(?:.*?\bdog\b){4}
It will match on a string with at least 5 instances of "cat" and at
least 4 of "dog". The "\b" anchors ensure that it will not match words
like "catapult" or "dogma" -- it will also ignore strings like "catcat"
or "dogcat". Remove them if you don't care about partial word matches.
If you want to match a string with the exact number of matches, that
will be more complicated.
--
Bowie