On 11/5/2014 2:12 PM, John Woods wrote:
I did skim bug 5503 earlier, but didn't understand it at first.
Knowing the history now, it makes a little more sense, although I'm
still fuzzy on why the value of "3" for the body and head points is
important.
Can disagree. I don't know the history either. I just know that 3 was
the magic number and the code did not work as logically documented.
It might be nice to have local.cf directives to allow admins to be
able to affect the $required_body_points and $required_head_points in
AutoLearnThreshold.pm. That way, admins could tune tweak this behavior
to allow more/less auto-learning... (i.e. 1 body points, and 2.5 head
points) Thoughts?
Agreed. Can you work on a patch to provide this?
As for Bayes strategies (and without starting a flamewar), we just
started implementing an IMAP folder in everyone's mailbox called
"Learn As Spam", that gets processed through "sa-learn --spam". It
sounds like we may need to leave auto-learning to SA's defaults, and
ask users to put e-mails in "Learn As Spam" and "Learn As Non-Spam"
folders. Perhaps relying on out-of-the-box auto-learning, and
tempering Bayes with user-based learning, may yield positive results.
Agreed. Hand sorted corpora for spam and ham will lead to the best
Bayes results and the system you are implementing is the closest
practical method to achieve such a system.
Regards,
KAM