> --On 02/04/05 09:17:55 -0400 Peter Marshall wrote: > > My question is the same as Henrik, I have a bunch of email that is > > spam (either tagged by spam assassin or not tagged at all. > I forwared > > it as an attachment to a "spam" mail box. What do I have to do now > > before I can get bayes to learn the message ... I read you have to > > remove the headers .... Could anyone give me a little more detail ? > > There's no 100% good way to do this; it depends on how the > message was mangled by the client (and possibly server). The > only guaranteed way is (as I described) to save a copy at the > same point as it is inspected by SpamAssassin so you can use it later. > > That being said, forwarding a message as an attachment will > usually preserve the headers pretty well. The perl MailTools > and MIME-tools modules have procedures to pull out > attachments and save them in the Unix format which sa-learn wants. > > Sorry I don't have any ready-made scripts for this; my users > dump messages into shared IMAP mailboxes which don't need any > preprocessing before being fed to sa-learn. > > -Kevin
Basically, I've got two option. All mail that is received is backupped on the mailserver before adding any headers. I could match those with mail received in the spam-learn and ham-learn accounts. However, mail is backupped only for a limited amount of time before being moved, after which the mail-server hasn't got any access to it. So unless people report mail that found it's way through the filters on a very regular basis it won't be a full proof sollution. The other option sounds more viable, I would only need to strip off the X-Scanned-By, X-Spam-* and X-Sanitized headers (which are ignored in my setup for bayes anyhow), BUT I have no guarentee that the message is in it's original format. Some MIME-Boundry rewriting may be done by the mailserver (where necessary), as is converting 8bit to 7bit where possible. And I think that there are many client-sided mailfiltering engines, spamscanners and virusscanners out there that may do some rewriting as well. >From above, I'm not sure that learning spam-assassin using forwarded messages that may or may not be in the original format as SpamAssassin received them the first time is a good idea. But I don't have enough knowledge of SpamAssassin's internal workings and it's bayes-filter to be sure... Kind Regards, Sander Holthaus