Steven Lamb wrote: > I have a corpus of email and have been trying to get good metrics on it. I > have run the messages through with spamassassin -t but this only adds stuff > onto the ends of all of my messages. is there any way to get a summary of > the test. i.e. how many are spam how many are ham average score so on so > forth. or ever have it separate my messages into different folders. I know > this is a newbie-ish question but I am indeed a newbie.
If you unpack the source tarball, there's a directory called "masses". This contains the tools used by the developers to perform mass-checks. You'll want to use mass-check first. http://wiki.apache.org/spamassassin/MassCheck from there, feed the spam.log and ham.log files to "hit-frequencies" which will generate a table just like the STATISTICS-*.txt files that come with SA (check the rules subdir of the tarball). http://wiki.apache.org/spamassassin/HitFrequencies