On Fri, 23 Jan 2009, Dennis Hardy wrote:

why are those scores low? What gives them negative score?
those rules have quite high score...

Here is an example (without my rules):  http://pastebin.com/m4400a74d

Can you repost that with full headers?

The ones that get through are relatively short and simple, and many are very "clean".

No DNSBL hits on the URI domain?

I've been thinking about maybe writing an SA plugin that counts the three repeated URL patterns that are always present in all of these spams, but I don't know where to start in trying to do that.

We'd need more than one sample URI to do a good job. Have you been collecting a corpus?

I notice that this URI has a format that may be a good spam sign: the domain name, followed by a long string of unpunctuated text gibberish.

Just off the top of my head and untested, how does this do against your corpus?

  uri GIBBERISH ;://[^/]{4,50}/(?=[a-z]{25,80}$)[a-z]{0,80}q[^u][a-z]{0,80}$;i

