[sniffer] Spam Leakage - last 2-3 weeks.

Pete McNeil Wed, 15 Sep 2004 11:11:03 -0700

Hello Sniffer,

  I think we've identified the cause of some reports of spam leakage
  over the past few weeks.


  I've been testing submitted messages against customer rulebases and
  I've noted that in almost every case there were rules that matched
  the messages.

  One of the customers testing with me pointed out that the results
  from my test included primarily rules from group 60 and 62. These
  are the Experimental IP and Experimental Abstract rule groups
  respectively.

  After that I reviewed other test results and found a similar thread.

  We have been announcing on the list that the content of our
  Experimental rule groups has been changing and that we have made
  these groups significantly more accurate in recent weeks.

  One of the changes that we have also made in these weeks is that we
  have increased the number of rules that are generated automatically
  from our spamtraps. The auto-rule AI runs every 20 minutes, much
  more frequently than we can manually review the incoming spam, as a
  result the system is much more responsive to new spam.

  All of these rules are placed in the appropriate experimental rule
  groups. As a result, over the past few weeks a greater number of new
  rules have been generated in these groups rather than manually into
  other groups. This trend will continue over time.

  We have not (and probably will not) implemented a practice of
  recoding these rules to specific content categories because this
  would be of little value. It turns out that the vast majority of the
  rule candidates generated by the AI are of the type that spammers
  re-use for multiple campaigns. For example, we might see a Snake-oil
  spam, a porn-spam, and a get-rich spam all within the same week
  using the same throw-away domain detected by our AI.

  If you are using a weighting system such as Declude and you have
  not yet revisited your weights on group 60 and 62, then you are
  probably seeing more "spam leakage" as a result.

  I recommend that you review your weights using a combination of your
  current experiences and the spam test quality analysis found here:

  <http://www2.spamchk.com/public.html>

  One formula that you can use to derive your test weights from this
  analysis is W = (SA^2)*HOLD_WEIGHT. So, in the case of these two
  groups you might select these weights for your system:

  SNIFFER-IP(60), estimated accuracy 81%, (.81)*(.81) => .6561,
    recommended weight: 66% of hold weight.

  SNIFFER-EXP(62), estimated accuracy 92%, (.92)*(.92) => .8464,
    recommended weight: 85% of hold weight.

  We are continuing to refine these processes and improve our accuracy
  so it is a good idea to review these settings periodically for the
  best performance.

  They days of the Gray-Hosting group with a high false positive rate
  are long gone and will not return ;-)

Thanks,
_M

Pete McNeil (Madscientist)
President, MicroNeil Research Corporation
Chief SortMonster (www.sortmonster.com)



This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html

[sniffer] Spam Leakage - last 2-3 weeks.

Reply via email to