hi-
we have a system [zimbra] where users can select a message in the mua
interface and click a spam or not spam button. this generates a message
[containing the selected message] which is ultimately delivered to a
mailbox. i intend on retrieving these messages via imap and feeding
sa-learn, but they've been a bit adulterated by the time they're
retrieved, and i believe some cleanup is probably necessary prior to
feeding sa-learn.
here are two samples:
http://dpaste.com/0B6S3FN.txt [claimed to be spam]
http://dpaste.com/3ZZ733Z.txt [claimed to be not spam]
the original message is encapsulated as an attachment, so i was planning
on extracting this and discarding the rest of the message - unless
sa-learn is magical enough to handle this?
aside from that, i've read
https://wiki.apache.org/spamassassin/BayesInSpamAssassin and man 1
sa-learn about spamassassin markup/headers, but would appreciate any
feedback for the above samples that might be pertinent - particular
headers that i may not have considered removing, etc.
thanks
-ben