Anthony Kamau wrote:
> Hello all.
>
> I'm faced with a dilemma on how to use sa-learn with mail forwarded from
> a user's inbox on Exchange to the sendmail server.  Since we just
> recently started using sendmail as a front end server, our bayes system
> is still in its infancy and spam is getting through to user inboxes with
> scores lower than our threshold of 10 and thus not being clearly
> identified as spam on the subject line.  My intention is to have a user
> forward spam back to sendmail server and use sa-learn to help the
> scoring system get better fast.
>
> Here's what I've done so far:
> I have created two email addresses for this purpose;
> [EMAIL PROTECTED] for spam and [EMAIL PROTECTED] for false
> positives.  I have created a connector that forwards all email destined
> for mail.domain.com back to the sendmail server and messages are getting
> into the appropriate mailboxes.
>
> The next step is what has me stunned - is there a standard marker to
> look out for that segregates the attachment from the mail sending the
> attachment?
>   
Standard? There's nothing that's standard about forwarding email.

That said, if you're just doing a "forward as attachment" type
operation, you should be able to get any standard mime attachment
extractor tool..

ie: http://search.cpan.org/dist/ppt/bin/mimedecode

If you're using an ordinary "forward", don't bother. The message has
been completely rebuilt and only has a visible-text resemblance to the
original. Generally a normal "forward" does the following, any of which
is more-or-less a different message as far as SA is concerned, but the
header ones are pretty catastrophic unless you can do major reconstruction.

1) discard ALL of the original message headers, and build new ones,
copying a minimal amount of text:
    -The message is now From: the forwardee, not the spammer.
    -All of the Received: headers are new.
    -Any out-of-the-ordinary headers are generally gone (ie: X-Id, 
X-Originating-IP, etc)
    -Even the subject is generally changed to include "Fwd:" or
something similar.
    -Obviously the X-Mailer and/or User-Agent is replaced with the one
for your MUA, not the original.

2) Significant changes to the body text:
    - For multipart/alternative messages, many mail clients will discard
the original text/plain, and build a new one based on the contents of
the text/html
    - Most add some kind of "Forwarded message follows" text
    - Most will re-do any character encodings. ie: a message that was
base64 encoded will probably not be.
    - Most will re-do line-wraps to suit their own tastes.
    - All will generate completely new mime boundaries which will
generally not be remotely similar to the originals.

   



Reply via email to