At 03:39 PM 11/30/2004, Bob Amen wrote:
Thank you for that very well written and helpful explanation! Now, do you have a script that computes the test values from a SA log file that you'd care to share?

You can't measure any of those performance metrics from logfiles alone.. there's no way to determine FP and FN count from just logs... Gotta have a human for that part.


There's really two ways
1) set up pre-sorted corpus pair and run against that, then calculate. You can detect FP and FN by doing separate runs on each half of the corpus. Any positives in the ham corpus are FPs...


2) go through your mail and hand-decide all the FPs and FNs, and combine that with the total statistics for that account from your logs. Of course, a script that breaks out logs by user, spam count and ham count could make that easier.. If your logs have the delivery account in the same line as the spam/ham claims of your filter, this could just be a simple pair of greps..
grep "mkettler" /var/log/mailog | grep "is spam" | wc -l


Option 2 involves no advance work, but depending on your log format it can be a painful process.





Reply via email to