Sounds like a great proposal!

Don't you mind opening a JIRA ticket about this?

Best regards,

Benoit

On 08/10/2019 01:53, Jerry Malcolm wrote:
> As a bit of background for this proposal, I personally have no problem
> trading a small overhead of some additional headers in a delivered
> email if I can see the serviceability info I need directly in the
> email in 2 seconds vs scanning/filtering through massive logs
> (assuming the logs were even set to 'debug' level when the mail came
> through).  I realize that other admins may or may not be willing to
> make this tradeoff.  So the new function in this proposal is
> completely optional to the admin.  Not enabling it will result in no
> changes for the user.
>
> Problem: SpamAssassin is an incredible tool for controlling spam.  But
> the current version of the mailet hands an email to SA where it runs a
> billion rules and comes back with "yup it's spam" or "nope it's
> clean".   That's perfectly acceptable for every single email received
> where we agree with the SA result.  The problem comes in when a client
> complains about false positives or false negatives.  Currently, the
> only possible response we can give the client is "because SpamAssassin
> said so" with no analysis data.  When SA needs tuning, I need a
> fighting chance at seeing how the (possibly incorrect) score was
> derived so I can make adjustments to the SA rules as needed.
>
> Solution Proposal Background: The current implementation of the
> SpamAssassin mailet and SpamAssassinInvoker hardcodes the command to
> spamd as "CHECK" which returns yes or no with a bit of threshold
> info.  Another valid command option to spamd is "REPORT".  It gives
> back the same info as "CHECK".  But it also returns analysis data. 
> Example:
>
> SPAMD/1.1 0 EX_OK
> Spam: False ; 3.9 / 5.0
>
> Spam detection software, running on the system "p5353013",
> has NOT identified this incoming email as spam.  The original
> message has been attached to this so you can view it or label
> similar future email.  If you have any questions, see
> webmas...@jwmhosting.com for details.
>
> Content preview:  ======================================== View on
> Facebook
>   https://www.facebook.com/nd/?Fox32Chic...
>    [...]
>
> Content analysis details:   (3.9 points, 5.0 required)
>
>  pts rule name              description
> ---- ----------------------
> --------------------------------------------------
> -2.5 RCVD_IN_HOSTKARMA_W    RBL: Sender listed in HOSTKARMA-WHITE
>                       [69.171.232.132 listed in
> hostkarma.junkemailfilter.com]
> -0.0 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
>                             [69.171.232.132 listed in wl.mailspike.net]
> -0.1 RCVD_IN_DNSWL_NONE     RBL: Sender listed at https://www.dnswl.org/,
>                              no trust
>                             [69.171.232.132 listed in list.dnswl.org]
>  1.0 CK_HELO_DYNAMIC_SPLIT_IP Relay HELO'd using suspicious hostname
>                             (Split IP)
>  0.0 TVD_RCVD_IP            Message was received from an IP address
> -0.0 SPF_HELO_PASS          SPF: HELO matches SPF record
>  0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or
>                             identical to background
>  0.0 HTML_MESSAGE           BODY: HTML included in message
>  1.1 KAM_REALLYHUGEIMGSRC   RAW: Spam with image tags with ridiculously
>                              huge http urls
>  0.5 JAM_SMALL_FONT_SIZE    RAW: Body of mail contains parts with very
>                             small font
>  3.9 HELO_DYNAMIC_IPADDR2   Relay HELO'd using suspicious hostname (IP
>                             addr 2)
>  0.0 UNPARSEABLE_RELAY      Informational: message has unparseable relay
>                             lines
>
> Solution Proposal:
>
> a) Add a new SpamAssassin mailet parameter (in addition to spamdPort,
> spamdHost) in mailetContainer.xml named "spamdCommand".  Absence of
> this parameter will default to the current "CHECK" command.  The two
> valid options are CHECK and REPORT.
>
> b) Pass the specified spamdCommand (or default) to spamd in Invoker.
>
> c) If spamdCommand is REPORT, add the report data as headers to the
> email using the following procedure:  parse full response into a
> TreeMap using "X-SpamAssassin_nnn" as keys.  nnn is an incrementing
> number in case the headers get jumbled and/or alphabetized downstream.
>
> d) The input stream from SA must be reset after REPORT processing.  So
> add mark()/reset() to BufferedReader to rewind the reader so existing
> downstream processing is not affected.  A limit (currently 2500
> characters) must be set on mark(..).  Check to ensure that limit is
> not exceeded during REPORT processing.  If it gets close to the limit,
> stop and add a "more..." header and exit.
>
> e) Pass the reportData TreeMap to SpamAssassinResult on both empty(..)
> and build(..) methods.  SpamAssassinResult will walk the TreeMap data
> and add to HeadersPerRecipient in the same way existing processing
> adds the Spam flags.  SpamAssassinResult will also log the TreeMap
> header data.  Why on 'empty()'?  empty() is called if the parser can't
> find a specific key in the string. There still may be some kind of
> error output even if the key is missing that could be very useful in
> determining why the expected key wasn't returned.  So I recommend we
> dump what we got back from SA no matter what.
>
> f) The final step is to process the HeadersPerRecipient data into real
> headers.  It turns out this function is missing downstream and a
> defect report has been opened by Tellier.  So when the code is
> available that processes HeadersPerRecipients into actual headers is
> available, no additional work is required.
>
> g) Currently if REPORT, the reportData automatically goes both to the
> log and to headers.  Not currently in my implementation, but we could
> add one more mailet parm "reportAsHeaders= true/false" so they could
> still get the report in the logs but not as headers.
>
> Summary: Admin can change the mailet parameter command to REPORT. 
> When a user reports false positive or false negative on an email. 
> Open the email in Thunderbird, hit Ctrl-U to view raw source/headers,
> and immediately see the scoring details from SpamAssassin. Obviously
> the next step for the admin would be to do something in SA to alter
> the scoring.  But that's beyond the scope of this proposal.
>
> I currently have this function coded and tested (currently with a hack
> to get around the bug in (f) above. I have had an old implementation
> running in v3b5 for several years.  I can give personal testimony to
> the hours it has saved me.
>
> Comments/Suggestions welcome
>
> Jerry
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
> For additional commands, e-mail: server-dev-h...@james.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to