On Fri, Feb 08, 2008 at 02:02:45PM +0100, Paolo Cravero wrote: > Arthur Dent wrote: > >> Hmmm... Not delete exactly, but the sa-learn job take so long that the >> archivemail job has kicked off and finds the "TempSpam" and "TempHam" mboxes >> in the Mail directory and dutifully chops out anything older than 180 days. I >> didn't think that that would be a problem, but maybe it's upsetting sa-learn? >> I will try switch the order of the jobs (archivemail running first) and see >> if >> that makes a difference.. > > At this point you have probably already swapped the two processes. > > I think sa-learn or the process feeding it does not like the chopping.
Yes. Sorry, I didn't post an update because I was embarrassed at my own stupidity for not thinking it through more carefully before posting my original message. Switching the jobs round did indeed mean that sa-learn is no longer getting interfered with by archivemail while it's in mid-learn. It now behaves quite sensibly. >> Well, as I explained in my previous post, the "TempHam" folder is a >> concatenation of all my non-spam folders. Mail that is older than 180 days is >> taken off at one end and new mail (c. 30-40 per day) added on at the other. >> The total remains roughly constant. > > Don't forget that sa-learn remembers which messages have been learned. Once > your old messages have all been learned, you need to feed to it only new > arrivals, that is since the last sa-learn run. No need to keep 180 days > worth of ham and spam in the temp folder! Yes I understand that. It's not that I *keep* a temp folder of spam/ham, I don't. I know that it only needs to learn the *new* mails. It's just that I'm basically lazy, and it seemed far easier for me simply to take all my non-spam folders and copy them together into one big temporary file, run sa-learn on it and then delete the temporary file, eg: #!/bin/bash cat ~/mail/mailinglists/* ~/mail/WorkStuff/* ~/mail/Admin/* > TempHam sa-learn --ham --mbox ~/mail/TempHam rm ~/mail/TempHam I'm no bash-scripting wiz (that much should be obvious!) so I could think of no *easy* way to strip only today's mails out of my 20-odd folders and just feed those to sa-learn. My way, I need to do nothing myself, the job takes about half an hour, and I'm asleep when it happens... OK, sa-learn has to work a bit harder than it needs to, but hey, better it than me! The 180 days thing is because I choose to keep only the last 6 months (approx) mail in each of my 20 or so folders, the rest being zipped into a gzip archive using "Archivemail" (a very neat little utility btw) and 180 days is its default setting (see I told you I was lazy!). > Let sa-learn complete and then chop the folder. Just concatenate the > process rather than schedule it in crontab. It should fix your apparent > weirdness. > > Paolo Thanks for all the help and suggestions. Much appreciated... Mark
pgpqXH6NXebY9.pgp
Description: PGP signature