Re: [vox-tech] Training spamassassin's bayenessian filter
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thursday 06 November 2003 10:34 am, [EMAIL PROTECTED] wrote: > On Thu 06 Nov 03, 10:17 AM, Ryan Castellucci <[EMAIL PROTECTED]> said: > > -BEGIN PGP SIGNED MESSAGE- > > The other neat thing spam assassin can do, with bayesian filtering, is > > autolearning. If the score is above or below a configurable level, it > > automaticaly trains on it, as spam or ham respectivly. > > > > For example > > > > X-Spam-Status: No, hits=-10.9 required=6.0 > > tests=EMAIL_ATTRIBUTION,HABEAS_SWE,IN_REP_TO,KNOWN_MAILING_LIST, > > PGP_SIGNATURE,QUOTED_EMAIL_TEXT,REFERENCES, > > REPLY_WITH_QUOTES > > autolearn=ham version=2.55 > > > > Unfortantly, there is no way for me to train the instance of spamassassin > > running at my ISP. > > as the bogofilter docs point out, this is an awful idea. autolearning > was recommended by bogofilter in its early stage, then the developers > rethought it and it's now discouraged. > > autolearning in non-linear. this means that infrequent and small > mistakes have the capacity to snowball into frequent and large mistakes. > > if you value your email, i would highly suggest turning autolearning > off. you can play with the threshold, but i value my ham too much to > play around with that! Well, all my spam goes into a folder labled 'filtered' that I look through once a week or so. If I could manualy train spamassassin, I would, and leave autolearning off. - -- PGP/GPG Fingerprint: 3B30 C6BE B1C6 9526 7A90 34E7 11DF 44F3 7217 7BC7 On pgp.mit.edu, import with `gpg --keyserver pgp.mit.edu --recv-key 72177BC7` Also available at http://www.cal.net/~ryan/ryan_at_mother_dot_com.asc -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qqBsEd9E83IXe8cRAue0AJ9wbIlcyw67tf4PK607Jxm7ECXyrgCgh4um QpF1sU404S5BOeOtAyamfFk= =GOO5 -END PGP SIGNATURE- ___ vox-tech mailing list [EMAIL PROTECTED] http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Training spamassassin's bayenessian filter
On Thu 06 Nov 03, 10:17 AM, Ryan Castellucci <[EMAIL PROTECTED]> said: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On Thursday 06 November 2003 08:58 am, [EMAIL PROTECTED] wrote: > > On Thu 06 Nov 03, 8:29 AM, R. Douglas Barbieri <[EMAIL PROTECTED]> said: > > > On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote: > > > > -BEGIN PGP SIGNED MESSAGE- > > > > Hash: SHA1 > > > > > > > > On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote: > > > > > Will SpamAssassin's bayenessian be more effective if I train it on > > > > > every message that comes through (even ones that it's built in tests > > > > > have already rejected as spam) or only on false negatives? > > > > > > > > Yes, it's much more effective if you train it on all messages. > > > > > > Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one of > > > the reasons I switched away from it to Bogofilter. > > > > i was wondering the same thing. it's actually a little difficult > > finding references to bayesian filtering on sa's website. if you do a > > google search, most of the results are on LUG mailing lists. > > > > according the sa site, version 2.5 had it. > > > > > > the version i'm using on one of the accounts i own on someone else's > > machine, 2.43, didn't have it. > > > > that's pretty cool. maybe someday /. will have a "bayesian filter > > shootout" to see who's most effective. ;-) but to be honest, > > bayesian filtering along with lexical parsing seems to be the most > > effective (incoming mail to dirac has both). sa's lexical filtering, > > for me at least, only catches the most obvious spams. i've had to bump > > up some of the score results to get anything resembling effective. i'm > > glad they introduced this new functionality. > > > > pete > > The other neat thing spam assassin can do, with bayesian filtering, is > autolearning. If the score is above or below a configurable level, it > automaticaly trains on it, as spam or ham respectivly. > > For example > > X-Spam-Status: No, hits=-10.9 required=6.0 > tests=EMAIL_ATTRIBUTION,HABEAS_SWE,IN_REP_TO,KNOWN_MAILING_LIST, > PGP_SIGNATURE,QUOTED_EMAIL_TEXT,REFERENCES, > REPLY_WITH_QUOTES > autolearn=ham version=2.55 > > Unfortantly, there is no way for me to train the instance of spamassassin > running at my ISP. as the bogofilter docs point out, this is an awful idea. autolearning was recommended by bogofilter in its early stage, then the developers rethought it and it's now discouraged. autolearning in non-linear. this means that infrequent and small mistakes have the capacity to snowball into frequent and large mistakes. if you value your email, i would highly suggest turning autolearning off. you can play with the threshold, but i value my ham too much to play around with that! pete -- GPG Instructions: http://www.dirac.org/linux/gpg GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D ___ vox-tech mailing list [EMAIL PROTECTED] http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Training spamassassin's bayenessian filter
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Thursday 06 November 2003 08:58 am, [EMAIL PROTECTED] wrote: > On Thu 06 Nov 03, 8:29 AM, R. Douglas Barbieri <[EMAIL PROTECTED]> said: > > On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote: > > > -BEGIN PGP SIGNED MESSAGE- > > > Hash: SHA1 > > > > > > On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote: > > > > Will SpamAssassin's bayenessian be more effective if I train it on > > > > every message that comes through (even ones that it's built in tests > > > > have already rejected as spam) or only on false negatives? > > > > > > Yes, it's much more effective if you train it on all messages. > > > > Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one of > > the reasons I switched away from it to Bogofilter. > > i was wondering the same thing. it's actually a little difficult > finding references to bayesian filtering on sa's website. if you do a > google search, most of the results are on LUG mailing lists. > > according the sa site, version 2.5 had it. > > > the version i'm using on one of the accounts i own on someone else's > machine, 2.43, didn't have it. > > that's pretty cool. maybe someday /. will have a "bayesian filter > shootout" to see who's most effective. ;-) but to be honest, > bayesian filtering along with lexical parsing seems to be the most > effective (incoming mail to dirac has both). sa's lexical filtering, > for me at least, only catches the most obvious spams. i've had to bump > up some of the score results to get anything resembling effective. i'm > glad they introduced this new functionality. > > pete The other neat thing spam assassin can do, with bayesian filtering, is autolearning. If the score is above or below a configurable level, it automaticaly trains on it, as spam or ham respectivly. For example X-Spam-Status: No, hits=-10.9 required=6.0 tests=EMAIL_ATTRIBUTION,HABEAS_SWE,IN_REP_TO,KNOWN_MAILING_LIST, PGP_SIGNATURE,QUOTED_EMAIL_TEXT,REFERENCES, REPLY_WITH_QUOTES autolearn=ham version=2.55 Unfortantly, there is no way for me to train the instance of spamassassin running at my ISP. - -- PGP/GPG Fingerprint: 3B30 C6BE B1C6 9526 7A90 34E7 11DF 44F3 7217 7BC7 On pgp.mit.edu, import with `gpg --keyserver pgp.mit.edu --recv-key 72177BC7` Also available at http://www.cal.net/~ryan/ryan_at_mother_dot_com.asc -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qpAcEd9E83IXe8cRAjcrAJ9DJhwHrHHEQROX2cEu0Cr8L1Tx4QCeJjF4 9suAKYZ1USRUSWdfK/x79XA= =r3R6 -END PGP SIGNATURE- ___ vox-tech mailing list [EMAIL PROTECTED] http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Training spamassassin's bayenessian filter
On Thu 06 Nov 03, 8:29 AM, R. Douglas Barbieri <[EMAIL PROTECTED]> said: > On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote: > > -BEGIN PGP SIGNED MESSAGE- > > Hash: SHA1 > > > > On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote: > > > Will SpamAssassin's bayenessian be more effective if I train it on > > > every message that comes through (even ones that it's built in tests > > > have already rejected as spam) or only on false negatives? > > > > Yes, it's much more effective if you train it on all messages. > > Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one of > the reasons I switched away from it to Bogofilter. i was wondering the same thing. it's actually a little difficult finding references to bayesian filtering on sa's website. if you do a google search, most of the results are on LUG mailing lists. according the sa site, version 2.5 had it. the version i'm using on one of the accounts i own on someone else's machine, 2.43, didn't have it. that's pretty cool. maybe someday /. will have a "bayesian filter shootout" to see who's most effective. ;-) but to be honest, bayesian filtering along with lexical parsing seems to be the most effective (incoming mail to dirac has both). sa's lexical filtering, for me at least, only catches the most obvious spams. i've had to bump up some of the score results to get anything resembling effective. i'm glad they introduced this new functionality. pete -- GPG Instructions: http://www.dirac.org/linux/gpg GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D ___ vox-tech mailing list [EMAIL PROTECTED] http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] Training spamassassin's bayenessian filter
On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote: > > Will SpamAssassin's bayenessian be more effective if I train it on > > every message that comes through (even ones that it's built in tests > > have already rejected as spam) or only on false negatives? > > Yes, it's much more effective if you train it on all messages. Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one of the reasons I switched away from it to Bogofilter. > - -- > PGP/GPG Fingerprint: 3B30 C6BE B1C6 9526 7A90 34E7 11DF 44F3 7217 7BC7 > On pgp.mit.edu, import with `gpg --keyserver pgp.mit.edu --recv-key 72177BC7` > Also available at http://www.cal.net/~ryan/ryan_at_mother_dot_com.asc > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.2.2 (GNU/Linux) > > iD8DBQE/qeMwEd9E83IXe8cRAlyeAJ9sbWEc3Xh6FMuOPlV+xN/IIhNe3wCfZ5ED > Vzr1RDtPeiqOyZGlKxnvqIY= > =qpF8 > -END PGP SIGNATURE- > ___ > vox-tech mailing list > [EMAIL PROTECTED] > http://lists.lugod.org/mailman/listinfo/vox-tech -- R. Douglas Barbieri [EMAIL PROTECTED] http://www.dooglio.net GPG Fingerprint : FE6A 6A57 2B95 7594 E534 BFEE 45F1 9E5E F30A 8A27 MIT.edu recv-key: C55B91D4 GPG Public key : http://www.dooglio.net/dooglio.asc pgp0.pgp Description: PGP signature
Re: [vox-tech] Training spamassassin's bayenessian filter
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote: > Will SpamAssassin's bayenessian be more effective if I train it on > every message that comes through (even ones that it's built in tests > have already rejected as spam) or only on false negatives? Yes, it's much more effective if you train it on all messages. - -- PGP/GPG Fingerprint: 3B30 C6BE B1C6 9526 7A90 34E7 11DF 44F3 7217 7BC7 On pgp.mit.edu, import with `gpg --keyserver pgp.mit.edu --recv-key 72177BC7` Also available at http://www.cal.net/~ryan/ryan_at_mother_dot_com.asc -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE/qeMwEd9E83IXe8cRAlyeAJ9sbWEc3Xh6FMuOPlV+xN/IIhNe3wCfZ5ED Vzr1RDtPeiqOyZGlKxnvqIY= =qpF8 -END PGP SIGNATURE- ___ vox-tech mailing list [EMAIL PROTECTED] http://lists.lugod.org/mailman/listinfo/vox-tech