Hmmm. Doesn't sound good. I sent a simple text message through a large ISP, to my server, arrived in a mbox. Compared that message to the one that was POPed, then sent back as an attachment and stripped out via the existing script.
These sanitized messages are pretty short but I put in pastebin: https://pastebin.com/b38RXHgx When looking in Outlook the headers all appear intact, but forwarding as an attachment appears to strip these: Delivered-To: All X- headers added by my SA All X- headers added by sending ISP (X-Yahoo*) Authentication results and DKIM signature Status: R Otherwise the rest of the headers were unaffected. I'm not sure how bad that stripping of X-headers, DKIM, etc screws up bayes learning?. Doesnt' SEEM that bad, but it's out of my skillset. Nor how bad it munges other stuff that SA needs to see in a more complex message that some of you mentioned. I need a way to go from Outlook to train SA if I'm to train at all. FOr most of my users the inbound mail is handed off to a 3rd party Exchange server that I don't have access to. So setting up a public IMAP folder on the exchange server type solution is probably not possible. And I presume that process messes with the messages too anyway. I can't cc the users mail on my server for later review, there would be too many. If I'm forwarded spam as an attachment for learning, I would require ham from the same method. My plan wasn't to make this a daily routine. Only to help a few users who say they are getting too much spam slipping through all the other checks untagged. To help train bayes to assist on those problem users. Old email accounts that can't be changed and are on the golden spam lists. The reason to "reassemble" the extracted attachments was just to make it easier for me to access the messages and review them. Too tedious at the console. Don't know how to use formal to do it, and wont' it add some more headers to the mess too? FWIW, I did try sa-learn on a sample of extracted attachments in their raw form. It was happy with them: [root@tn3 msg-1502747659-31280-0]# sa-learn --spam * Learned tokens from 97 message(s) (97 message(s) examined) But picking through them to vet them would be too tedious at the console. They get random number type filenames as part of the extraction. My constraints are: - messages are sent to 3rd party exchange server - exchange server access does not exist at this time - users use Outlook client at least v2003 - I use site wide bayes - I don't trust the users to feed bayes. - I can't cc their Email on my server for later feeding. - I want to use this process for corpus building, not daily maintenance. My plan was: - receive spam and ham (separately) "as attachments" form outlook - extract attachments - review attachments - feed attachments to sa-learn Open for a better method.. Grateful for help with a formail command to assemble and try out if someone is a guru. To get it into mboxcl2 format that my Dovecot uses and SA would be happy with (https://wiki2.dovecot.org/MailboxFormat/mbox) Thanks -- View this message in context: http://spamassassin.1065346.n5.nabble.com/message-rfc822-to-mbox-script-for-use-with-sa-learn-workflow-tp138362p138379.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.