On Fri, 23 Jan 2009, Dennis Hardy wrote:


why are those scores low? What gives them negative score?
those rules have quite high score...

Here is an example (without my rules):  http://pastebin.com/m4400a74d

Can you repost that with full headers?

The ones that get through are relatively short and simple, and many are very "clean".

No DNSBL hits on the URI domain?

I've been thinking about maybe writing an SA plugin that counts the three repeated URL patterns that are always present in all of these spams, but I don't know where to start in trying to do that.

We'd need more than one sample URI to do a good job. Have you been collecting a corpus?

I notice that this URI has a format that may be a good spam sign: the domain name, followed by a long string of unpunctuated text gibberish.

Just off the top of my head and untested, how does this do against your corpus?

  uri GIBBERISH ;://[^/]{4,50}/(?=[a-z]{25,80}$)[a-z]{0,80}q[^u][a-z]{0,80}$;i

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Gun Control is nothing more than an attempt to return to feudalism,
  where the peasants are helpless and must humbly petition their lord
  and master to protect them from bandits and thieves (when they can
  get around to it), and where the lords and masters can abuse the
  peasants whenever they like without fear of effective resistance.
-----------------------------------------------------------------------
 4 days until Wolfgang Amadeus Mozart's 253rd Birthday

Reply via email to