Re: Calling Regex Experts

D . J . Thu, 24 Aug 2006 12:33:52 -0700

On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:

D.J. wrote:
> On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
> > D.J. wrote:
> > > On 8/24/06, Bart Schaefer < [EMAIL PROTECTED]> wrote:
> > > > On 8/24/06, D. J. <[EMAIL PROTECTED] > wrote:
> > > > >
> > > > > I'm expecting these type of strings for sure:
> > > > >
> > > > > cat
> > > > > dog
> > > > > cat dog
> > > > > dog cat
> > > > >
> > > > > But I may get something like this too:
> > > > >
> > > > > cat cat dog
> > > > > dog dog
> > > > >
> > > > > Essentially I want it to match if anything other than cat or
> > > > > dog is in the string.
> > > >
> > > > That constraint means you have to construct a regex that can be
> > > > anchored at both beginning and end of string, e.g.
> > > > /\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in
> > > > the context of a spamassassin rule, except maybe one matching
> > > > against a specific header.
> > >
> > > That's the idea... I've got the RELAY_COUNTRIES plugin that I want
> > > it to place a small score if the relay server is not in the US or
> > > Canada.  However, I'm not sure if the plugin will list the same
> > > country multiple times, which is where my uncertainty in the "cat
> > > cat dog" scenario came in.  So far my original rule ( !~ /cat|dog/)
> > > seems to be working well, but if I have a spammer smart enough to
> > > manage to bounce his spam originating in China off of somewhere in
> > > the US before it hits my MX, then that rule will fail.  Am I
> > > possibly too paranoid?
> >
> > Ok.  Try this one:
> >
> >    $value =~ /\b(?!cat\b|dog\b)\w+\b/i
> >
> > This will match any word in the string as long as that word is not
> > "cat" or "dog".
>
> OK, we're actually really close.  That actually matched everything I
> didn't want to match... we just have to get it to do the opposite of
> that.  I have 6 test strings I tested against in a test script:
>
> cat
> dog
> cat dog
> dog cat
> bird
> cat bird
>
> It matched the top four (incorrectly).

Are you sure you used it correctly?  This is a positive match (=~), not a
negative match (!~).

Test program:
    @strings = ( "cat", "dog", "cat dog", "dog cat", "bird",
                 "cat bird", "caterwaul" );
    for $str (@strings) {
        if ($str =~ /\b(?!cat\b|dog\b)\w+\b/i) {
            print "$str -- MATCHED\n";
        }
        else {
            print "$str -- no match\n";
        }
    }

Output:
    cat -- no match
    dog -- no match
    cat dog -- no match
    dog cat -- no match
    bird -- MATCHED
    cat bird -- MATCHED
    caterwaul -- MATCHED

--
Bowie

BINGO! I still had my negative in there, I only copied the / to / part of the regex. You sir, are the man!

Re: Calling Regex Experts

Reply via email to