Thursday, June 26, 2003, 11:39:49 AM, Mark wrote:
M> There has been some talk about Popfile on this forum, so I
M> thought I would comment. I have given up on Popfile after an
M> unnaceptable number of crucial false positives, very slow
M> system responses, and a lot of work to train.

It's worked well for me.  Far better than the hassle of sorting
through the enormous number of spams I get.  My e-mail has been
around on the Internet for nearly 10 years. In that time, it's
been added to every spam mailing list in existence! :(  As
spammers change tactics, I end up doing a bit of retraining every
month or so when more spams start leaking through than I find
acceptable.  But, the "training" is pretty trivial and easy to

I do, however, glance through e-mails categorized as spam before
deleting them. Usually I sort by the "From" address to quickly
see if the e-mails are familiar. Spammers usually have fairly
obvious From addresses.

M> Some of the statistics we hear (99.5% sorting efficiency) is
M> distorted by the a priori probability (ie, 95% efficiency
M> would be achieved by simply sorting mailing lists, and from
M> known addresses, so 99.9% really represents 98% or so -
M> meaning that a lot of the not-easily-sorted mail is lost.

I classify into 8 buckets:
Emails Classified
Bucket   Classification Count 
bat      2,099 (27.19%)
etrade      47 (0.6%)
list        42 (0.54%)
muscle      89 (1.15%)
mvst       881 (11.41%)
personal   674 (8.73%) 
spam     3,719 (48.18%)
tennis     167 (2.16%)

     Overall accuracy:
    Emails classified: 7,719
Classification errors: 154 

Accuracy: 98%

Most of the errors happen between my swim team (mvst) category
and tennis category, which is kind of tough since some of the
same people are in both categories and the words between the two
can be similar.

This is probably officially off-topic at this point! :)

Dave Kennedy

Current version is 1.62r | "Using TBUDL" information:

Reply via email to