Re: [vox-tech] Training spamassassin's bayenessian filter

2003-11-06 Thread Ryan Castellucci
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thursday 06 November 2003 10:34 am, [EMAIL PROTECTED] wrote:
> On Thu 06 Nov 03, 10:17 AM, Ryan Castellucci <[EMAIL PROTECTED]> said:
> > -BEGIN PGP SIGNED MESSAGE-
> > The other neat thing spam assassin can do, with bayesian filtering, is
> > autolearning. If the score is above or below a configurable level, it
> > automaticaly trains on it, as spam or ham respectivly.
> >
> > For example
> >
> > X-Spam-Status: No, hits=-10.9 required=6.0
> > tests=EMAIL_ATTRIBUTION,HABEAS_SWE,IN_REP_TO,KNOWN_MAILING_LIST,
> >   PGP_SIGNATURE,QUOTED_EMAIL_TEXT,REFERENCES,
> >   REPLY_WITH_QUOTES
> > autolearn=ham version=2.55
> >
> > Unfortantly, there is no way for me to train the instance of spamassassin
> > running at my ISP.
>
> as the bogofilter docs point out, this is an awful idea.  autolearning
> was recommended by bogofilter in its early stage, then the developers
> rethought it and it's now discouraged.
>
> autolearning in non-linear.  this means that infrequent and small
> mistakes have the capacity to snowball into frequent and large mistakes.
>
> if you value your email, i would highly suggest turning autolearning
> off.  you can play with the threshold, but i value my ham too much to
> play around with that!

Well, all my spam goes into a folder labled 'filtered' that I look through 
once a week or so. If I could manualy train spamassassin, I would, and leave 
autolearning off.

- -- 
PGP/GPG Fingerprint: 3B30 C6BE B1C6 9526 7A90  34E7 11DF 44F3 7217 7BC7
On pgp.mit.edu, import with `gpg --keyserver pgp.mit.edu --recv-key 72177BC7`
Also available at http://www.cal.net/~ryan/ryan_at_mother_dot_com.asc
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/qqBsEd9E83IXe8cRAue0AJ9wbIlcyw67tf4PK607Jxm7ECXyrgCgh4um
QpF1sU404S5BOeOtAyamfFk=
=GOO5
-END PGP SIGNATURE-
___
vox-tech mailing list
[EMAIL PROTECTED]
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] Training spamassassin's bayenessian filter

2003-11-06 Thread p
On Thu 06 Nov 03, 10:17 AM, Ryan Castellucci <[EMAIL PROTECTED]> said:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On Thursday 06 November 2003 08:58 am, [EMAIL PROTECTED] wrote:
> > On Thu 06 Nov 03,  8:29 AM, R. Douglas Barbieri <[EMAIL PROTECTED]> said:
> > > On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote:
> > > > -BEGIN PGP SIGNED MESSAGE-
> > > > Hash: SHA1
> > > >
> > > > On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote:
> > > > > Will SpamAssassin's bayenessian be more effective if I train it on
> > > > > every message that comes through (even ones that it's built in tests
> > > > > have already rejected as spam) or only on false negatives?
> > > >
> > > > Yes, it's much more effective if you train it on all messages.
> > >
> > > Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one of
> > > the reasons I switched away from it to Bogofilter.
> >
> > i was wondering the same thing.  it's actually a little difficult
> > finding references to bayesian filtering on sa's website.  if you do a
> > google search, most of the results are on LUG mailing lists.
> >
> > according the sa site, version 2.5 had it.
> >
> >
> > the version i'm using on one of the accounts i own on someone else's
> > machine, 2.43, didn't have it.
> >
> > that's pretty cool.  maybe someday /. will have a "bayesian filter
> > shootout" to see who's most effective.   ;-)   but to be honest,
> > bayesian filtering along with lexical parsing seems to be the most
> > effective (incoming mail to dirac has both).  sa's lexical filtering,
> > for me at least, only catches the most obvious spams.  i've had to bump
> > up some of the score results to get anything resembling effective.  i'm
> > glad they introduced this new functionality.
> >
> > pete
> 
> The other neat thing spam assassin can do, with bayesian filtering, is 
> autolearning. If the score is above or below a configurable level, it 
> automaticaly trains on it, as spam or ham respectivly.
> 
> For example
> 
> X-Spam-Status: No, hits=-10.9 required=6.0
> tests=EMAIL_ATTRIBUTION,HABEAS_SWE,IN_REP_TO,KNOWN_MAILING_LIST,
>   PGP_SIGNATURE,QUOTED_EMAIL_TEXT,REFERENCES,
>   REPLY_WITH_QUOTES
> autolearn=ham version=2.55
> 
> Unfortantly, there is no way for me to train the instance of spamassassin 
> running at my ISP.
 
as the bogofilter docs point out, this is an awful idea.  autolearning
was recommended by bogofilter in its early stage, then the developers
rethought it and it's now discouraged.

autolearning in non-linear.  this means that infrequent and small
mistakes have the capacity to snowball into frequent and large mistakes.

if you value your email, i would highly suggest turning autolearning
off.  you can play with the threshold, but i value my ham too much to
play around with that!

pete

-- 
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D
___
vox-tech mailing list
[EMAIL PROTECTED]
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] Training spamassassin's bayenessian filter

2003-11-06 Thread Ryan Castellucci
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thursday 06 November 2003 08:58 am, [EMAIL PROTECTED] wrote:
> On Thu 06 Nov 03,  8:29 AM, R. Douglas Barbieri <[EMAIL PROTECTED]> said:
> > On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote:
> > > -BEGIN PGP SIGNED MESSAGE-
> > > Hash: SHA1
> > >
> > > On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote:
> > > > Will SpamAssassin's bayenessian be more effective if I train it on
> > > > every message that comes through (even ones that it's built in tests
> > > > have already rejected as spam) or only on false negatives?
> > >
> > > Yes, it's much more effective if you train it on all messages.
> >
> > Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one of
> > the reasons I switched away from it to Bogofilter.
>
> i was wondering the same thing.  it's actually a little difficult
> finding references to bayesian filtering on sa's website.  if you do a
> google search, most of the results are on LUG mailing lists.
>
> according the sa site, version 2.5 had it.
>
>
> the version i'm using on one of the accounts i own on someone else's
> machine, 2.43, didn't have it.
>
> that's pretty cool.  maybe someday /. will have a "bayesian filter
> shootout" to see who's most effective.   ;-)   but to be honest,
> bayesian filtering along with lexical parsing seems to be the most
> effective (incoming mail to dirac has both).  sa's lexical filtering,
> for me at least, only catches the most obvious spams.  i've had to bump
> up some of the score results to get anything resembling effective.  i'm
> glad they introduced this new functionality.
>
> pete

The other neat thing spam assassin can do, with bayesian filtering, is 
autolearning. If the score is above or below a configurable level, it 
automaticaly trains on it, as spam or ham respectivly.

For example

X-Spam-Status: No, hits=-10.9 required=6.0
tests=EMAIL_ATTRIBUTION,HABEAS_SWE,IN_REP_TO,KNOWN_MAILING_LIST,
  PGP_SIGNATURE,QUOTED_EMAIL_TEXT,REFERENCES,
  REPLY_WITH_QUOTES
autolearn=ham version=2.55

Unfortantly, there is no way for me to train the instance of spamassassin 
running at my ISP.

- -- 
PGP/GPG Fingerprint: 3B30 C6BE B1C6 9526 7A90  34E7 11DF 44F3 7217 7BC7
On pgp.mit.edu, import with `gpg --keyserver pgp.mit.edu --recv-key 72177BC7`
Also available at http://www.cal.net/~ryan/ryan_at_mother_dot_com.asc
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/qpAcEd9E83IXe8cRAjcrAJ9DJhwHrHHEQROX2cEu0Cr8L1Tx4QCeJjF4
9suAKYZ1USRUSWdfK/x79XA=
=r3R6
-END PGP SIGNATURE-
___
vox-tech mailing list
[EMAIL PROTECTED]
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] Training spamassassin's bayenessian filter

2003-11-06 Thread p
On Thu 06 Nov 03,  8:29 AM, R. Douglas Barbieri <[EMAIL PROTECTED]> said:
> On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> > 
> > On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote:
> > > Will SpamAssassin's bayenessian be more effective if I train it on
> > > every message that comes through (even ones that it's built in tests
> > > have already rejected as spam) or only on false negatives?
> > 
> > Yes, it's much more effective if you train it on all messages.
> 
> Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one of
> the reasons I switched away from it to Bogofilter.
 
i was wondering the same thing.  it's actually a little difficult
finding references to bayesian filtering on sa's website.  if you do a
google search, most of the results are on LUG mailing lists.

according the sa site, version 2.5 had it.


the version i'm using on one of the accounts i own on someone else's
machine, 2.43, didn't have it.

that's pretty cool.  maybe someday /. will have a "bayesian filter
shootout" to see who's most effective.   ;-)   but to be honest,
bayesian filtering along with lexical parsing seems to be the most
effective (incoming mail to dirac has both).  sa's lexical filtering,
for me at least, only catches the most obvious spams.  i've had to bump
up some of the score results to get anything resembling effective.  i'm
glad they introduced this new functionality.

pete


-- 
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D
___
vox-tech mailing list
[EMAIL PROTECTED]
http://lists.lugod.org/mailman/listinfo/vox-tech


Re: [vox-tech] Training spamassassin's bayenessian filter

2003-11-06 Thread R. Douglas Barbieri
On Wed, Nov 05, 2003 at 09:59:12PM -0800, Ryan Castellucci wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote:
> > Will SpamAssassin's bayenessian be more effective if I train it on
> > every message that comes through (even ones that it's built in tests
> > have already rejected as spam) or only on false negatives?
> 
> Yes, it's much more effective if you train it on all messages.

Woah. Dumb question, but when did SpamAssassin go Bayesian? It's one of
the reasons I switched away from it to Bogofilter.

> - -- 
> PGP/GPG Fingerprint: 3B30 C6BE B1C6 9526 7A90  34E7 11DF 44F3 7217 7BC7
> On pgp.mit.edu, import with `gpg --keyserver pgp.mit.edu --recv-key 72177BC7`
> Also available at http://www.cal.net/~ryan/ryan_at_mother_dot_com.asc
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.2.2 (GNU/Linux)
> 
> iD8DBQE/qeMwEd9E83IXe8cRAlyeAJ9sbWEc3Xh6FMuOPlV+xN/IIhNe3wCfZ5ED
> Vzr1RDtPeiqOyZGlKxnvqIY=
> =qpF8
> -END PGP SIGNATURE-
> ___
> vox-tech mailing list
> [EMAIL PROTECTED]
> http://lists.lugod.org/mailman/listinfo/vox-tech

-- 
R. Douglas Barbieri
[EMAIL PROTECTED]
http://www.dooglio.net

GPG Fingerprint : FE6A 6A57 2B95 7594 E534  BFEE 45F1 9E5E F30A 8A27
MIT.edu recv-key: C55B91D4
GPG Public key  : http://www.dooglio.net/dooglio.asc


pgp0.pgp
Description: PGP signature


Re: [vox-tech] Training spamassassin's bayenessian filter

2003-11-05 Thread Ryan Castellucci
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wednesday 05 November 2003 09:24 pm, Ken Bloom wrote:
> Will SpamAssassin's bayenessian be more effective if I train it on
> every message that comes through (even ones that it's built in tests
> have already rejected as spam) or only on false negatives?

Yes, it's much more effective if you train it on all messages.

- -- 
PGP/GPG Fingerprint: 3B30 C6BE B1C6 9526 7A90  34E7 11DF 44F3 7217 7BC7
On pgp.mit.edu, import with `gpg --keyserver pgp.mit.edu --recv-key 72177BC7`
Also available at http://www.cal.net/~ryan/ryan_at_mother_dot_com.asc
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/qeMwEd9E83IXe8cRAlyeAJ9sbWEc3Xh6FMuOPlV+xN/IIhNe3wCfZ5ED
Vzr1RDtPeiqOyZGlKxnvqIY=
=qpF8
-END PGP SIGNATURE-
___
vox-tech mailing list
[EMAIL PROTECTED]
http://lists.lugod.org/mailman/listinfo/vox-tech