Re: Proposal: SpamAssassin Mailet enhancements

Tellier Benoit Mon, 07 Oct 2019 23:33:14 -0700

Here is the JIRA: https://issues.apache.org/jira/projects/JAMES/issues/JAMES


You are free to create an account on it.

Cheers,

Benoit

On 08/10/2019 13:16, Jerry Malcolm wrote:
> I will do that.  But I need a bit of direction first.  I know what
> JIRA does.  But what do I need to do in order to get a JIRA account
> set up?  And is there a home page for JAMES JIRA?
>
> On 10/7/2019 9:34 PM, Tellier Benoit wrote:
>> Sounds like a great proposal!
>>
>> Don't you mind opening a JIRA ticket about this?
>>
>> Best regards,
>>
>> Benoit
>>
>> On 08/10/2019 01:53, Jerry Malcolm wrote:
>>> As a bit of background for this proposal, I personally have no problem
>>> trading a small overhead of some additional headers in a delivered
>>> email if I can see the serviceability info I need directly in the
>>> email in 2 seconds vs scanning/filtering through massive logs
>>> (assuming the logs were even set to 'debug' level when the mail came
>>> through).  I realize that other admins may or may not be willing to
>>> make this tradeoff.  So the new function in this proposal is
>>> completely optional to the admin.  Not enabling it will result in no
>>> changes for the user.
>>>
>>> Problem: SpamAssassin is an incredible tool for controlling spam.  But
>>> the current version of the mailet hands an email to SA where it runs a
>>> billion rules and comes back with "yup it's spam" or "nope it's
>>> clean".   That's perfectly acceptable for every single email received
>>> where we agree with the SA result.  The problem comes in when a client
>>> complains about false positives or false negatives.  Currently, the
>>> only possible response we can give the client is "because SpamAssassin
>>> said so" with no analysis data.  When SA needs tuning, I need a
>>> fighting chance at seeing how the (possibly incorrect) score was
>>> derived so I can make adjustments to the SA rules as needed.
>>>
>>> Solution Proposal Background: The current implementation of the
>>> SpamAssassin mailet and SpamAssassinInvoker hardcodes the command to
>>> spamd as "CHECK" which returns yes or no with a bit of threshold
>>> info.  Another valid command option to spamd is "REPORT".  It gives
>>> back the same info as "CHECK".  But it also returns analysis data.
>>> Example:
>>>
>>> SPAMD/1.1 0 EX_OK
>>> Spam: False ; 3.9 / 5.0
>>>
>>> Spam detection software, running on the system "p5353013",
>>> has NOT identified this incoming email as spam.  The original
>>> message has been attached to this so you can view it or label
>>> similar future email.  If you have any questions, see
>>> webmas...@jwmhosting.com for details.
>>>
>>> Content preview:  ======================================== View on
>>> Facebook
>>>    https://www.facebook.com/nd/?Fox32Chic...
>>>     [...]
>>>
>>> Content analysis details:   (3.9 points, 5.0 required)
>>>
>>>   pts rule name              description
>>> ---- ----------------------
>>> --------------------------------------------------
>>> -2.5 RCVD_IN_HOSTKARMA_W    RBL: Sender listed in HOSTKARMA-WHITE
>>>                        [69.171.232.132 listed in
>>> hostkarma.junkemailfilter.com]
>>> -0.0 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
>>>                              [69.171.232.132 listed in
>>> wl.mailspike.net]
>>> -0.1 RCVD_IN_DNSWL_NONE     RBL: Sender listed at
>>> https://www.dnswl.org/,
>>>                               no trust
>>>                              [69.171.232.132 listed in list.dnswl.org]
>>>   1.0 CK_HELO_DYNAMIC_SPLIT_IP Relay HELO'd using suspicious hostname
>>>                              (Split IP)
>>>   0.0 TVD_RCVD_IP            Message was received from an IP address
>>> -0.0 SPF_HELO_PASS          SPF: HELO matches SPF record
>>>   0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or
>>>                              identical to background
>>>   0.0 HTML_MESSAGE           BODY: HTML included in message
>>>   1.1 KAM_REALLYHUGEIMGSRC   RAW: Spam with image tags with
>>> ridiculously
>>>                               huge http urls
>>>   0.5 JAM_SMALL_FONT_SIZE    RAW: Body of mail contains parts with very
>>>                              small font
>>>   3.9 HELO_DYNAMIC_IPADDR2   Relay HELO'd using suspicious hostname (IP
>>>                              addr 2)
>>>   0.0 UNPARSEABLE_RELAY      Informational: message has unparseable
>>> relay
>>>                              lines
>>>
>>> Solution Proposal:
>>>
>>> a) Add a new SpamAssassin mailet parameter (in addition to spamdPort,
>>> spamdHost) in mailetContainer.xml named "spamdCommand".  Absence of
>>> this parameter will default to the current "CHECK" command.  The two
>>> valid options are CHECK and REPORT.
>>>
>>> b) Pass the specified spamdCommand (or default) to spamd in Invoker.
>>>
>>> c) If spamdCommand is REPORT, add the report data as headers to the
>>> email using the following procedure:  parse full response into a
>>> TreeMap using "X-SpamAssassin_nnn" as keys.  nnn is an incrementing
>>> number in case the headers get jumbled and/or alphabetized downstream.
>>>
>>> d) The input stream from SA must be reset after REPORT processing.  So
>>> add mark()/reset() to BufferedReader to rewind the reader so existing
>>> downstream processing is not affected.  A limit (currently 2500
>>> characters) must be set on mark(..).  Check to ensure that limit is
>>> not exceeded during REPORT processing.  If it gets close to the limit,
>>> stop and add a "more..." header and exit.
>>>
>>> e) Pass the reportData TreeMap to SpamAssassinResult on both empty(..)
>>> and build(..) methods.  SpamAssassinResult will walk the TreeMap data
>>> and add to HeadersPerRecipient in the same way existing processing
>>> adds the Spam flags.  SpamAssassinResult will also log the TreeMap
>>> header data.  Why on 'empty()'?  empty() is called if the parser can't
>>> find a specific key in the string. There still may be some kind of
>>> error output even if the key is missing that could be very useful in
>>> determining why the expected key wasn't returned.  So I recommend we
>>> dump what we got back from SA no matter what.
>>>
>>> f) The final step is to process the HeadersPerRecipient data into real
>>> headers.  It turns out this function is missing downstream and a
>>> defect report has been opened by Tellier.  So when the code is
>>> available that processes HeadersPerRecipients into actual headers is
>>> available, no additional work is required.
>>>
>>> g) Currently if REPORT, the reportData automatically goes both to the
>>> log and to headers.  Not currently in my implementation, but we could
>>> add one more mailet parm "reportAsHeaders= true/false" so they could
>>> still get the report in the logs but not as headers.
>>>
>>> Summary: Admin can change the mailet parameter command to REPORT.
>>> When a user reports false positive or false negative on an email.
>>> Open the email in Thunderbird, hit Ctrl-U to view raw source/headers,
>>> and immediately see the scoring details from SpamAssassin. Obviously
>>> the next step for the admin would be to do something in SA to alter
>>> the scoring.  But that's beyond the scope of this proposal.
>>>
>>> I currently have this function coded and tested (currently with a hack
>>> to get around the bug in (f) above. I have had an old implementation
>>> running in v3b5 for several years.  I can give personal testimony to
>>> the hours it has saved me.
>>>
>>> Comments/Suggestions welcome
>>>
>>> Jerry
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
>>> For additional commands, e-mail: server-dev-h...@james.apache.org
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
>> For additional commands, e-mail: server-dev-h...@james.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
> For additional commands, e-mail: server-dev-h...@james.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Re: Proposal: SpamAssassin Mailet enhancements

Reply via email to