Matic Dolar wrote to [EMAIL PROTECTED]:

Hi!

I'm writing a study about spam and would like to do the following: -
run spamassassin against a whole mailbox file (presumably quite large
one) full of mixed messages - both spam and ham (as well as a
significant amount of viruses (albeit removed by antivirus-scanner).
- analyze the results I get - amount of spam received, amount of spam
caught, etc ...

If you really want to do this right, you should use the tools in the "masses" subdirectory of the SA distribution. You'll want to split the mailboxes up into separate spam and ham corpora, and run mass-check on each. From there, the spam.log and ham.log you obtain should tell you almost everything you need to know.

http://wiki.apache.org/spamassassin/MassCheck

The masses stuff is designed with this purpose in mind. Although it is
tilted more toward rule analysis and scoring, those results are a
superset of most other results you might want to obtain, such as average
ham/spam scores, overall tagging accuracy, etc.

- Ryan

--
  Ryan Thompson <[EMAIL PROTECTED]>

  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4

        Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669     (877-SASKNOW)     North America

Reply via email to