Sounds like a great proposal! Don't you mind opening a JIRA ticket about this?
Best regards, Benoit On 08/10/2019 01:53, Jerry Malcolm wrote: > As a bit of background for this proposal, I personally have no problem > trading a small overhead of some additional headers in a delivered > email if I can see the serviceability info I need directly in the > email in 2 seconds vs scanning/filtering through massive logs > (assuming the logs were even set to 'debug' level when the mail came > through). I realize that other admins may or may not be willing to > make this tradeoff. So the new function in this proposal is > completely optional to the admin. Not enabling it will result in no > changes for the user. > > Problem: SpamAssassin is an incredible tool for controlling spam. But > the current version of the mailet hands an email to SA where it runs a > billion rules and comes back with "yup it's spam" or "nope it's > clean". That's perfectly acceptable for every single email received > where we agree with the SA result. The problem comes in when a client > complains about false positives or false negatives. Currently, the > only possible response we can give the client is "because SpamAssassin > said so" with no analysis data. When SA needs tuning, I need a > fighting chance at seeing how the (possibly incorrect) score was > derived so I can make adjustments to the SA rules as needed. > > Solution Proposal Background: The current implementation of the > SpamAssassin mailet and SpamAssassinInvoker hardcodes the command to > spamd as "CHECK" which returns yes or no with a bit of threshold > info. Another valid command option to spamd is "REPORT". It gives > back the same info as "CHECK". But it also returns analysis data. > Example: > > SPAMD/1.1 0 EX_OK > Spam: False ; 3.9 / 5.0 > > Spam detection software, running on the system "p5353013", > has NOT identified this incoming email as spam. The original > message has been attached to this so you can view it or label > similar future email. If you have any questions, see > webmas...@jwmhosting.com for details. > > Content preview: ======================================== View on > Facebook > https://www.facebook.com/nd/?Fox32Chic... > [...] > > Content analysis details: (3.9 points, 5.0 required) > > pts rule name description > ---- ---------------------- > -------------------------------------------------- > -2.5 RCVD_IN_HOSTKARMA_W RBL: Sender listed in HOSTKARMA-WHITE > [69.171.232.132 listed in > hostkarma.junkemailfilter.com] > -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) > [69.171.232.132 listed in wl.mailspike.net] > -0.1 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, > no trust > [69.171.232.132 listed in list.dnswl.org] > 1.0 CK_HELO_DYNAMIC_SPLIT_IP Relay HELO'd using suspicious hostname > (Split IP) > 0.0 TVD_RCVD_IP Message was received from an IP address > -0.0 SPF_HELO_PASS SPF: HELO matches SPF record > 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or > identical to background > 0.0 HTML_MESSAGE BODY: HTML included in message > 1.1 KAM_REALLYHUGEIMGSRC RAW: Spam with image tags with ridiculously > huge http urls > 0.5 JAM_SMALL_FONT_SIZE RAW: Body of mail contains parts with very > small font > 3.9 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP > addr 2) > 0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay > lines > > Solution Proposal: > > a) Add a new SpamAssassin mailet parameter (in addition to spamdPort, > spamdHost) in mailetContainer.xml named "spamdCommand". Absence of > this parameter will default to the current "CHECK" command. The two > valid options are CHECK and REPORT. > > b) Pass the specified spamdCommand (or default) to spamd in Invoker. > > c) If spamdCommand is REPORT, add the report data as headers to the > email using the following procedure: parse full response into a > TreeMap using "X-SpamAssassin_nnn" as keys. nnn is an incrementing > number in case the headers get jumbled and/or alphabetized downstream. > > d) The input stream from SA must be reset after REPORT processing. So > add mark()/reset() to BufferedReader to rewind the reader so existing > downstream processing is not affected. A limit (currently 2500 > characters) must be set on mark(..). Check to ensure that limit is > not exceeded during REPORT processing. If it gets close to the limit, > stop and add a "more..." header and exit. > > e) Pass the reportData TreeMap to SpamAssassinResult on both empty(..) > and build(..) methods. SpamAssassinResult will walk the TreeMap data > and add to HeadersPerRecipient in the same way existing processing > adds the Spam flags. SpamAssassinResult will also log the TreeMap > header data. Why on 'empty()'? empty() is called if the parser can't > find a specific key in the string. There still may be some kind of > error output even if the key is missing that could be very useful in > determining why the expected key wasn't returned. So I recommend we > dump what we got back from SA no matter what. > > f) The final step is to process the HeadersPerRecipient data into real > headers. It turns out this function is missing downstream and a > defect report has been opened by Tellier. So when the code is > available that processes HeadersPerRecipients into actual headers is > available, no additional work is required. > > g) Currently if REPORT, the reportData automatically goes both to the > log and to headers. Not currently in my implementation, but we could > add one more mailet parm "reportAsHeaders= true/false" so they could > still get the report in the logs but not as headers. > > Summary: Admin can change the mailet parameter command to REPORT. > When a user reports false positive or false negative on an email. > Open the email in Thunderbird, hit Ctrl-U to view raw source/headers, > and immediately see the scoring details from SpamAssassin. Obviously > the next step for the admin would be to do something in SA to alter > the scoring. But that's beyond the scope of this proposal. > > I currently have this function coded and tested (currently with a hack > to get around the bug in (f) above. I have had an old implementation > running in v3b5 for several years. I can give personal testimony to > the hours it has saved me. > > Comments/Suggestions welcome > > Jerry > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > For additional commands, e-mail: server-dev-h...@james.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org