Matic Dolar wrote to [EMAIL PROTECTED]:
Hi!
I'm writing a study about spam and would like to do the following: - run spamassassin against a whole mailbox file (presumably quite large one) full of mixed messages - both spam and ham (as well as a significant amount of viruses (albeit removed by antivirus-scanner). - analyze the results I get - amount of spam received, amount of spam caught, etc ...
If you really want to do this right, you should use the tools in the "masses" subdirectory of the SA distribution. You'll want to split the mailboxes up into separate spam and ham corpora, and run mass-check on each. From there, the spam.log and ham.log you obtain should tell you almost everything you need to know.
http://wiki.apache.org/spamassassin/MassCheck
The masses stuff is designed with this purpose in mind. Although it is tilted more toward rule analysis and scoring, those results are a superset of most other results you might want to obtain, such as average ham/spam scores, overall tagging accuracy, etc.
- Ryan
-- Ryan Thompson <[EMAIL PROTECTED]>
SaskNow Technologies - http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon Toll-Free: 877-727-5669 (877-SASKNOW) North America
