Hiyo!
HAving a problem. Maybe with the RE, or with SA processing HTML....
Set-up a rule to look for any.garbage.anything.stuff.realdomain.tld -
anything with a bunch of bogus host names on front:
uri LOC_HTMLLONGHREF /(?:[^"\/\.]+\.){5,}[^"\/\.]+/i
describe LOC_HTMLLONGHREF href has too many 'hostnames' in domain
score LOC_HTMLLONGHREF 0.5
If I got it right, the RE says:
Look for five or more occurences of
( one or more of anything except a " or / or .
followed by a . )
All followed by a string of anything except a " or / or .
Yet it triggered on what looks like a regular URL. There is
definitely something 'weird' about the mail. It has a 'quoted
printable' part, which repeats the plain text mail, and it includes a
couple of '3D' codes in places where I think they mess up the
scanning of HTML and URI's by SA? A new spammer trick?
Relevant e-mail bits:
X-Spam-Status: No, hits=3.0 required=3.5 autolearn=no tests=HTML_20_30=0.474,
HTML_MESSAGE=0.001,LOC_HTMLLONGHREF=0.5,NO_DNS_FOR_FROM=1.105,
PRIORITY_NO_NAME=0.831,RCVD_IN_SORBS=0.1
This is how the HTML is quoted when I just hit my 'reply' in PINE:
!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
HTML>HEAD>
META http-equiv=Content-Type content="text/html; charset=utf-8">
STYLE></STYLE>
/HEAD>
BODY bgColor=#ffffff>
Hello, [EMAIL PROTECTED]>
These btihces don't want to fuck with you? Show WHO THE REAL MAN
IS!BR>BR>BR>
JOIN NOW and FORCE THEM!BR>BR>
Get to know about REAL BURTE RAPE!BR>BR>
Tons of donwoldabale moives, phtoos and stroies!BR>BR>
A href="http://www.sunnymail.info?id=1094">down mouse button there/A>
dubtzkau ckylcohoup. dnodnad.BR>BR>BR>nylun upybreube; xiuramiri
tigawemus
mokmroa gblcla.>
wortrvanenurm mnogicagaz- racsuzwjigihaa.eowhifr. ygieml ndmbd.BR>BR>
nhenugmogmp quogarnulirr; usbuoei eregespaw.BR><Bhuutrj. topoct-
aptyrtli.BR>
lmlyisfa.BR>BR>
aeghuater hsckhmaj dduknioev- jabulet misairuurf.br>BR>
doidazi.BR>BR>/BODY>/HTML>
I've snipped all the left-angle brackets to avoid having anyone's e-mail
parse the HTML, but I notice there is a right angle bracket and a left
bracket in the obfuscating text, in the last seven lines, above.
Now here is what it looks like when I edit may mailbox as raw text.
Notice the '3D' on the 'meta' line, and in the 'href'.....
META http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8">
STYLE>/STYLE>
/HEAD>
BODY bgColor=3D#ffffff>
Hello, [EMAIL PROTECTED]>
These btihces don't want to fuck with you? Show WHO THE REAL MAN =
IS!BR>BR>BR>
JOIN NOW and FORCE THEM!BR>BR>
Get to know about REAL BURTE RAPE!BR>BR>
Tons of donwoldabale moives, phtoos and stroies!BR>BR>
<A href=3D"http://www.sunnymail.info?id=3D1094">down mouse butto=
n there/A>
dubtzkau ckylcohoup. dnodnad.BR>BR>BR>=
Gee, does the line-break "=" in the middle of the URI mess up the test?
IF this is a spammer trick, I'm thinking that my 'longhref'
test may be doing a better job than I ever intended.... :-)
- Charles