Alex, from prypiat. Yes, I recycle.
On 12-09-27 11:09 AM, Bowie Bailey wrote: > On 9/27/2012 10:41 AM, Alexandre Boyer wrote: >> Hello all, >> >> Here is a small ruleset that I'm working with. I added it to our >> local ruleset in prod: >> >> # BAD LINKS N-NG ;-) ; >> # Canada Post >> >> >> >> &n >> b sp; >> uri_detail AJB_CANPOST_BADLINK raw !~ /canadapost\./ >> text =~ /(?:https?:\/\/(?:www\.)?|www\.)canadapost\./ type =~ /^a$/ >> describe AJB_CANPOST_BADLINK Found a mismatch >> between href and anchored text pretending to link to >> www.canadapost.ca >> score AJB_CANPOST_BADLINK 1.0 >> meta AJB_CANPOST_PHISH_BADTRACKNUM Z_CANPOST_BADLINK && >> !Z_CANPOST_TRACKNUM >> describe AJB_CANPOST_PHISH_BADTRACKNUM Mismatch between href >> and anchored + unofficial tracking number from CanadaPost >> score AJB_CANPOST_PHISH_BADTRACKNUM 2.0 >> # >> >> youtube >> >> >> & >> n bsp; >> uri_detail AJB_UTUBE_BADLINK raw !~ /youtube\./ text =~ >> /(?:https?:\/\/(?:www\.)?|www\.)youtube\./ type =~ /^a$/ >> describe AJB_UTUBE_BADLINK Found a mismatch between href and >> anchored text pretending to link to www.youtube.com >> score AJB_UTUBE_BADLINK 0.5 >> # because of link trackers (from massmailer for example), we must >> meta this with other rulz to be sure we face our fake yutube botnet >> meta AJB_FK_UTUBE_BOTNET Z_UTUBE_BADLINK && Z_EMPTY_SUBJ >> && MIME_HTML_ONLY >> describe AJB_FK_UTUBE_BOTNET mismatch between href and >> anchored + empty subject = botnet >> score AJB_FK_UTUBE_BOTNET 5.5 >> ## & nbsp; >> # TODO: check if we could workwith DKIM, exists:List-Unsubscribe, >> SPF_PASS, RCVD_IN_RP_SAFE, RCVD_IN_RP_CERTIFIED and others >> # in order to avoid FPs from MassMailers. >> >> Note the TODO ;-) > > Don't know if it makes much difference in this case, but... > > (?:https?:\/\/(?:www\.)?|www\.) Should catch: http:// https:// http://www. https://www. www. > > can be simplified to: > > (?:https?:\/\/|www\.) > While this catches: http:// https:// www. Covering less. It's may be overkill, but my regex has one and only purpose: match any kind of "valid" web link, as per common user experience (ie. "as seen on TV"). The spammer will try to lure the common user by mimic what the common user is habituated to see, no? > Since you're not anchoring the front of the regexp or trying to > capture the match, the results will be the same. > Not capturing because not using thereafter. On a small system, this makes no difference. On large systems (millions+ emails filtered a day), this is probably making a difference. I take a guess here, I don't want to prove this on my own systems :-) Alex.