GRP Productions wrote on Mon, 14 Mar 2005 03:41:40 +0200:

> Indeed, this is the CVS version :-) 

I have been trying to get something from CVS for several days now, no luck.

> This is perhaps because I have been using only 'mistake-based' training (ie 
> training only when false classificaiton happens). However this used to work 
> fine. 

Bayes needs constant training, but this doesn't mean it needs any manual 
training. Once it's up and running and "well-greased" it should take care of 
itself by auto-learning (bayes_auto_learn 1, don't know if on by default). 
About 70 or 80% of our spam and ham (especially the spam) is autolearned.

>  
> >your "hold time" is quite low, it's about a month. I think we haven tokens 
> >from 
> >even a year ago. That's maybe a bit too much, but I strongly suggest upping 
> >your bayes_expiry_max_db_size to something like 500.000 or so. Since you 
> >have a 
> >much higher flux of messages than we have on that machine you are literally 
> >"burning" your db to uselessness. 
>  
> So what would you suggest? I certainly dont want to lose everything that has 
> been learned till now. 

Actually, with those "few" tokens you won't loose much if you throw it away ;-) 
As I said upping that should help, no need to throw it away unless you think 
that's easier (if most spam you get scores at BAYES_50 it might be better to 
start over than to convince the db that it's spam).

> Nope, there is definitely only the one comng with MS. I never use SA from 
> the command line anyway.

Well, let's go back:
you sa-learn a message, it says it learned, you dump magic and see there's no 
change, you look in the directory and there's no journal. There *has* to be at 
least one additional Bayes db. Or something happens which I haven't heard of in 
my about three years of using SA+Bayes. What's the output of "sa-learn --dump 
magic"? Don't specify a config file!
 
> bayes_path              /var/spool/MailScanner/bayes/bayes 

and what's in your /etc/mail/spamassassin/local.conf?

> bayes_auto_expire 0
ok, that means it won't expire. Of course, if it doesn't grow this isn't 
necessary ... ;-)

> bayes_expiry_max_db_size 500000
I assume you just added>/changed that?

> If I get it you mean that the tokens are lost very quickly?

Yes. However, now that I know that your bayes_expiry is off we have a different 
case? Since when has it been off? Since Feb. 11 as your dump magic suggests? 
Your oldest token is Feb. 2. So that either means your started the db that day 
or you are burning your tokens in 10 days. That's one problem, upping to a 
higher ceiling, as you already did, should take care of that. The other problem 
is that it's apparently not growing. One of the reasons is, of course, that you 
only learn by mistake. So, how often is that done? How many do you actually add 
this way? The second part of this other problem is that even if you learn it 
doesn't seem to learn. I don't see another possibility as that it uses 
different dbs.

 I think am 
> confused , if bayes works with tokens, why does it need nspam and nham? Or 
> are they just counters? 

It's just the number of spam and ham messages you learned to it. Yes, it's more 
or less informational only.

>  
> In general, do you think that setting bayes_expiry_max_db_size would be 
> enough? 

To cure the fast expiration, yes, but you didn't expire for the last 30 days, 
anyway.

> One final thing: Why even if i manually expire, the date of last expiration 
> remains old?

Same reason as above: you work on different dbs. What does the expire output 
show?


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org



Reply via email to