On Tue, 18 Nov 2003, Chris Santerre wrote:

> > > uri      WLS_URI_1 /^http:.*\b0-go.org\b/i
>
> Regex confusion on my part! '\b' is bounding, but I thought that meant bound
> by space??? wouldn't this above regex _NOT_ hit :
>
> http://stuff.0-go.org/stuff
>
> Isn't it looking for:
> http://stuff. 0-go.org
>
> I'm confused! (it's not the first time, won't be the last!)
>
> --Chris

The "\b" match operator is a bit special in that it does not
match a specific character but the "gap" between two adjacent
characters. Think of it like the "insertion" cursor of a word
processor, it points between the characters, not on a character.
Sort of like "^" points to the beginning of a line, not at the
first character of the line.

If you know what the perl "\w" and "\W" character classes are,
then \b points to the boundary between two characters that are
matched by either the regex "\W\w" or the regex "\w\W"

See page 180 of the O'Reilly "Programming Perl" book (Third edition).
(Good book, written by a guy named Larry Wall ;)

So that WLS_URI_1 regex is looking for:
start-of-line, followed by the litteral character string "http:"
Possibly followed by some number of unidentified characters, the
last one of which -must- match the "\W" character class
(note that there could be zero of the above critters as the ":"
at the end of "http:" nicely matches the "\W" requirement).
Followed by the litteral character string "0-go", followed by
one random character (note that "." is a wildcard), followed by
the litteral character string "org" followed by something that
matches "\W". (Thus "orga" would not match here).

Boy, it takes a bunch of words to explain what that little jumble
of regex does, powerful stuff these regexes ;)

Dave

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to