On Friday 05 September 2003 10:21 pm, Christoph Haas wrote:

> On Sat, Sep 06, 2003 at 02:22:00AM +1200, mdew wrote:
> > Using regex "/etc/squid.adservers" I'm attempting to block any URL's
> > with "penis" AND "large" in the url. Basically *penis*large* and
> > *large*penis*  ..I was looking at doing like so..
> >
> > (/large/ && /penis/)
> > (/penis/ && /large/)
>
> See "man 7 regex". I would suggest something like:
> (large.*penis|penis.*large)

Beware of attempting this sort of thing without word boundaries.   For 
example, there is a town in the north of England called Penistone, and it's 
not hard to find several URLs (eg in Google) which include the 5 letters 
"penis" without being the sort of thing you're trying to block:

http://www.penistonereinforcements.com

I didn't bother to look for a URL which had "large" somewhere in it as well, 
but it's not hard to imagine such a false positive existing.

Maybe you're happy to block a few false positive web pages in exchange for a 
higher number of true positives, but it's a choice you should be aware you're 
making.

Antony.

-- 

The only problem with the Universe as a platform, though, is that it is 
currently running someone else's program.

 - Ken Karakotsios, author of SimLife

Reply via email to