On Fri, Feb 08, 2008 at 02:02:45PM +0100, Paolo Cravero wrote:
> Arthur Dent wrote:
>
>> Hmmm... Not delete exactly, but the sa-learn job take so long that the
>> archivemail job has kicked off and finds the "TempSpam" and "TempHam" mboxes
>> in the Mail directory and dutifully chops out anything older than 180 days. I
>> didn't think that that would be a problem, but maybe it's upsetting sa-learn?
>> I will try switch the order of the jobs (archivemail running first) and see 
>> if
>> that makes a difference.. 
>
> At this point you have probably already swapped the two processes.
>
> I think sa-learn or the process feeding it does not like the chopping.

Yes. Sorry, I didn't post an update because I was embarrassed at my own
stupidity for not thinking it through more carefully before posting my
original message. Switching the jobs round did indeed mean that sa-learn
is no longer getting interfered with by archivemail while it's in
mid-learn. It now behaves quite sensibly.

>> Well, as I explained in my previous post, the "TempHam" folder is a
>> concatenation of all my non-spam folders. Mail that is older than 180 days is
>> taken off at one end and new mail (c. 30-40 per day) added on at the other.
>> The total remains roughly constant.
>
> Don't forget that sa-learn remembers which messages have been learned. Once 
> your old messages have all been learned, you need to feed to it only new 
> arrivals, that is since the last sa-learn run. No need to keep 180 days 
> worth of ham and spam in the temp folder!

Yes I understand that. It's not that I *keep* a temp folder of spam/ham, I
don't. I know that it only needs to learn the *new* mails. It's just that
I'm basically lazy, and it seemed far easier for me simply to take all
my non-spam folders and copy them together into one big temporary
file, run sa-learn on it and then delete the temporary file, eg:

#!/bin/bash
cat ~/mail/mailinglists/* ~/mail/WorkStuff/* ~/mail/Admin/* > TempHam
sa-learn --ham --mbox ~/mail/TempHam
rm ~/mail/TempHam

I'm no bash-scripting wiz (that much should be obvious!) so I could
think of no *easy* way to strip only today's mails out of my 20-odd
folders and just feed those to sa-learn. My way, I need to do nothing
myself, the job takes about half an hour, and I'm asleep when it
happens... OK, sa-learn has to work a bit harder than it needs to, but
hey, better it than me!

The 180 days thing is because I choose to keep only the last 6 months
(approx) mail in each of my 20 or so folders, the rest being zipped into a gzip
archive using "Archivemail" (a very neat little utility btw) and 180 days is
its default setting (see I told you I was lazy!).

> Let sa-learn complete and then chop the folder. Just concatenate the 
> process rather than schedule it in crontab. It should fix your apparent 
> weirdness.
>
> Paolo

Thanks for all the help and suggestions. Much appreciated...

Mark

Attachment: pgpqXH6NXebY9.pgp
Description: PGP signature

Reply via email to