I sued to have a setup where IMAP users could put mail into either SPAM or Junk 
mailboxes to have it auto trained and then I had a script that stepped through 
and did the training, and it also processed non-new mail in the inbox as ham.

USERROOT="$HOME";
MAILP="Maildir";

   J_PATH="$USERROOT/${MAILP}/.Junk";
   S_PATH="$USERROOT/${MAILP}/.SPAM";
   H_PATH="$USERROOT/${MAILP}/cur";

if [ `test -d $J_PATH` ]; then
   /usr/local/bin/sa-learn --spam --progress $i $J_PATH/{new,cur}
fi

if [ `test -d $S_PATH` ]; then
   /usr/local/bin/sa-learn --spam --progress $i $S_PATH/{new,cur}
fi

if [ `test -d $H_PATH` ]; then
   /usr/local/bin/sa-learn --ham $H_PATH
fi

This all worked fine, but it was very resource intensive, and it only worked 
with the very few shell users. I tried to run it (manually) a few times with 
the virtual users, but I ended up with a process that ground the computer to a 
halt and generated a bayes database that was massively large (GBs).

So, other than throwing more iron at the problem, is there something I can do 
to make this process a little smarter? Make it work with the virtual users 
without generating a massive db file?

-- 
'What can I do? I'm only human,' he said aloud.  Someone said, Not all
of you. --Pyramids

Reply via email to