> --On 02/04/05 09:17:55 -0400 Peter Marshall wrote:
> > My question is the same as Henrik, I have a bunch of email that is 
> > spam (either tagged by spam assassin or not tagged at all.  
> I forwared 
> > it as an attachment to a "spam" mail box.  What do I have to do now 
> > before I can get bayes to learn the message ... I read you have to 
> > remove the headers .... Could anyone give me a little more detail ?
> 
> There's no 100% good way to do this; it depends on how the 
> message was mangled by the client (and possibly server).  The 
> only guaranteed way is (as I described) to save a copy at the 
> same point as it is inspected by SpamAssassin so you can use it later.
> 
> That being said, forwarding a message as an attachment will 
> usually preserve the headers pretty well.  The perl MailTools 
> and MIME-tools modules have procedures to pull out 
> attachments and save them in the Unix format which sa-learn wants.
> 
> Sorry I don't have any ready-made scripts for this; my users 
> dump messages into shared IMAP mailboxes which don't need any 
> preprocessing before being fed to sa-learn.
> 
>       -Kevin

Basically, I've got two option. All mail that is received is backupped on
the mailserver before adding any headers. I could match those with mail
received in the spam-learn and ham-learn accounts. However, mail is
backupped only for a limited amount of time before being moved, after which
the mail-server hasn't got any access to it. So unless people report mail
that found it's way through the filters on a very regular basis it won't be
a full proof sollution.

The other option sounds more viable, I would only need to strip off the
X-Scanned-By, X-Spam-* and X-Sanitized headers (which are ignored in my
setup for bayes anyhow), BUT I have no guarentee that the message is in it's
original format. Some MIME-Boundry rewriting may be done by the mailserver
(where necessary), as is converting 8bit to 7bit where possible. And I think
that there are many client-sided mailfiltering engines, spamscanners and
virusscanners out there that may do some rewriting as well.

>From above, I'm not sure that learning spam-assassin using forwarded
messages that may or may not be in the original format as SpamAssassin
received them the first time is a good idea. But I don't have enough
knowledge of SpamAssassin's internal workings and it's bayes-filter to be
sure...

Kind Regards,
Sander Holthaus

Reply via email to