> -----Messaggio originale-----
> Da: Fletcher Mattox [mailto:[EMAIL PROTECTED]
> 
> Dan,
> 
> Just to be clear, I took that dump before I learned the 500 hams.
> Here is a dump after I learned the hams.  It looks normal to me.
> 
> 0.000          0          3          0  non-token data: bayes db
> version
> 0.000          0      14787          0  non-token data: nspam
> 0.000          0        610          0  non-token data: nham
> 0.000          0     246131          0  non-token data: ntokens
> 0.000          0 1177142672          0  non-token data: oldest atime
> 0.000          0 1179789825          0  non-token data: newest atime
> 0.000          0 1179789837          0  non-token data: last journal
> sync atime
> 0.000          0 1179761284          0  non-token data: last expiry
> atime
> 0.000          0      43200          0  non-token data: last expire
> atime delta
> 0.000          0      90881          0  non-token data: last expire
> reduction count
> 
> And yes, I was *very* careful about the quality of the ham before
> I learned it.

Are you confident also about the quality of the spam Bayes learned?

High bayes scores on ham may be due more to some ham being learned as spam
than to some spam being learned as ham.

The latter would instead cause some spam to score low, not some ham to score
high.

Giampaolo


> 
> Fletcher
> 
> Dan Barker writes:
> >You might review the runs of those 500 hams you think you trained.
> Only 86
> >hams show in your dump magic, so the training either failed (all
> dups?) or
> >went into a different database (easy to do!).
> >
> >Dan
> >
> >-----Original Message-----
> >From: Fletcher Mattox [mailto:[EMAIL PROTECTED]
> >Sent: Monday, May 21, 2007 11:57 PM
> >To: users@spamassassin.apache.org
> >Subject: Bayes problem: very large spam/ham ratio
> >
> >
> >Hi,
> >
> >After years of stability, my bayes db is doing poorly.  When I first
> >noticed it, it was classifying lots of ham BAYES_99, I cleared the db
> >and started over.  Now it finds *very* few ham.
> >
> >0.000          0          3          0  non-token data: bayes db
> version
> >0.000          0      14779          0  non-token data: nspam
> >0.000          0         86          0  non-token data: nham
> >0.000          0     231925          0  non-token data: ntokens
> >0.000          0 1177142672          0  non-token data: oldest atime
> >0.000          0 1179789654          0  non-token data: newest atime
> >0.000          0 1179789681          0  non-token data: last journal
> sync
> >atime
> >0.000          0 1179761284          0  non-token data: last expiry
> atime
> >0.000          0      43200          0  non-token data: last expire
> atime
> >delta
> >0.000          0      90881          0  non-token data: last expire
> >reduction count
> >
> >I've seen people report large spam/ham ratios on this list, but this
> >seems extreme,  >170:1.  So I added about 500 ham (I am sure of the
> >quality) to the db with "sa-learn --ham", hoping that would help.
> >But it is still behaving poorly, over 20% of my ham is BAYES_99.
> >(Normally less the 1% of my ham is BAYES_99.)
> >
> >Does anyone know why my system can't find any ham?  It's a fairly
> typical
> >university site of about 10000 messages/day with a 50/50 ham/spam
> ratio,
> >so I know it is receiving plenty of ham.  Running 3.2.0 if it matters.
> >
> >Thanks,
> >Fletcher

Reply via email to