Greetings;

About an hour ago, based on some comments made that the bayes database needed 
trained on ham as well as spam, and because it seemed to be forgetting some 
of the stuff I'd fed it as spam, I re-wrote that filter rule in kmail to 
launch it using one of my sorted directories from a mailing this as the 
argument.  Syntax otherwise the same as the sa-learn-spam filter.

The sa-learn --spam can process a message in 5 to 10 seconds or so, so if I've 
dropped 20 doofus mails in the spam directory and fire it off, I have it done 
and kmail is back among the living in 2-3 minutes.

But, feeding it a 'ham' directory with about 7k messages in it, turned 
sa-learn into a 100% cpu hog, incrementing the message processed number only 
about every 3 to 5 minutes. I couldn't kill it, it kept coming back and I 
must have fed it a kill -9 50 times.  Finally, one of the kills killed x too!  
But no console came back, so I had to hit the reset button.  The reboot was 
like molassses in January, so I did a power down, same story.  Same story 3 
times running, so I went and made a sandwich while it set powered down.  Then 
the reboot was normal up to e2fscking a a 372GB drive I use for amanda, the 
backup proggy.  That hung, with no indication of progress for about 20 
minutes, no marching **** or anything.  But it finally fell through and 
completed the bootup, and is running normally now but it has taken the 
majority of an hour to do this.

So what is the maximum number of files in a directory that one can feed to 
sa-learn --ham and expect it to achieve normal speed?  I vaguely recall 
feeding it my corpus of another folder it was having trouble with a year ago, 
the linux-usb list, 600 to 1k messages in it and it was finished in an hour 
that time.

The command that kmail issues to it is:
sa-learn --ham  /root/Mail/(foldername)/cur

Where foldername is whatever mailing list I want to tell it is ham.

Is this correct?  I've had it setup that way for 2 or 3 years at least and 
till now it hasn't been that much of a problem.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"What a wonder is USENET; such wholesale production of conjecture from
such a trifling investment in fact."
-- Carl S. Gutekunst

Reply via email to