On Sat, Jul 04, 2009 at 11:53:27AM +0300, Jari Fredriksson wrote:
> > Hello,
> >
> > while I get currently several 1000 shop/meds/pill/gen spams  a  day  and
> > some are going throug my filters, I have to move them to  my  spamfolder
> > manualy and feed them to "sa-learn --spam" but this does not work...
> >
> > ...because the Spamer From: is in the auto_whitelist.
> >
> > For me, this seems to be a bug, becuase sa-learn has to remove the From:
> > from the auto_whitelist and then RESCAN this crap.
> >
> > the two last days I have uncompressed the spamarchives from the last  27
> > weeks (from this year), used "formail"  to  extract  all  From:  E-Mails
> > unified them and used
> >
> >     for FROM in ${LIST} ; do
> >         spamassassin --remove--addr-from-whitelist=${FROM}
> >     done
> >
> > which took over 52 hours for 487000 EMails.  Hell, I have a  super  fast
> > machine with 15000 RpM SCSI drives and 32 GByte of memory.  This are 2.6
> > E-Mails per second...

You are loading a big perl program for every single email, what do you
expect? ;)

You should edit the database directly. If not using SQL, it's a bit more
trickier.. could modify trim_whitelist to do it etc..

> Do You have SQL based AWL? If not, it might  be worth a consideration,
> given your amounts of email.
> 
> With SQL
> 
>      for FROM in ${LIST} ; do
>          mysql -u spamassassin -psecret spamassassin <<EOF
>          delete from awl where email='${FROM}' ;
>      EOF
>      done
> 
> Should be MUCH faster.

It's possible that $FROM may contain quote characters, so it should be
handled. It's always a good practise, even though I doubt any emails contain
SQL injections..

Also you could just output all sql clauses into a file first and then run
it. To avoid the same pitfall as above, though in a smaller scale. ;)

Reply via email to