Re: Subtest __E_LIKE_LETTER and __LOWER_E listed many times in message header

Chris Pollock Tue, 11 Dec 2018 13:37:35 -0800

On Mon, 2018-12-10 at 13:09 -0500, Bill Cole wrote:
> On 9 Dec 2018, at 18:23, Chris Pollock wrote:
> 
> > On Sun, 2018-12-09 at 13:06 -0500, Bill Cole wrote:
> > > On 9 Dec 2018, at 12:04, Chris Pollock wrote:
> > > 
> > > > This is probably very trivial and doesn't affect anything
> > > > except
> > > > maybe
> > > > the size of the headers but I have to ask. When looking at the
> > > > headers
> > > > of some ham I noticed - https://pastebin.com/H7euxqVX the two
> > > > rules
> > > > I
> > > > mention above are in 72_active.cf. Is there a reason for the
> > > > number
> > > > of
> > > > times it's listed? Couldn't each subtest be listed just once
> > > > instead
> > > > of
> > > > multiple times?
> > > 
> > > Not with the current documented behavior of the code, given the
> > > way
> > > those sub-rules are designed to work together. The goal is to
> > > identify
> > > messages which use Latin-script 'e' characters but also use many
> > > non-Latin-script characters which look like 'e' but are not. To
> > > make
> > > this determination, the rules require the 'multiple' flag without
> > > a
> > > cap
> > > on thne number of matches which a 'maxhits' parameter would set.
> > 
> > Got it, thanks Bill. I've never noticed this before. I also noticed
> > that according to my daily sa-update output this subtest is
> > apparently
> > new or at least it didn't appear in the output until this past Fri.
> 
> Correct. See the thread with the subject "No longer just embedded =9D
> characters in blackmail emails" here last week for the background.
> 
> > > 
> > > It is not recommended to routinely add the list of matched sub-
> > > rules
> > > to
> > > scanned messages.
> > > 
> > 
> > Any specific reason why? This is just on my home system.
> 
> It's got the potential to be VERY noisy (as you've discovered) while
> not really providing much useful info.  Not a big deal on a small
> system.
> 
> 
> Anyway, as of today I've capped those 2 subrules at levels which
> leave ample space to still match the target spam. Should show up in
> tomorrow's update.


I see in today's update that the subrule was changed from this:

if can(Mail::SpamAssassin::Conf::feature_bug6558_free)
  ifplugin Mail::SpamAssassin::Plugin::ReplaceTags
    meta            T_MIXED_ES        ( __LOWER_E > 20 ) && (
__E_LIKE_LETTER > ( (__LOWER_E * 14 ) / 10) ) && ( ( __E_LIKE_LETTER /
__LOWER_E ) < 10 )
    describe        T_MIXED_ES        Too many es are not es

To this:

if can(Mail::SpamAssassin::Conf::feature_bug6558_free)
  ifplugin Mail::SpamAssassin::Plugin::ReplaceTags
    body            __E_LIKE_LETTER /<E>/
    tflags          __E_LIKE_LETTER multiple

SA-update was run at 12:03pm here on my box. A message that came in
well after the update still shows nearly the same output as before

https://pastebin.com/aSXVj5ri

I can't see where the update made any difference Bill. However, maybe I
don't understand the rule and it's doing what it's supposed to.

-- 
Chris
KeyID 0xE372A7DA98E6705C
31.11972; -97.90167 (Elev. 1092 ft)
15:24:32 up 3 days, 19:48, 1 user, load average: 0.54, 0.55, 0.33
Description:    Ubuntu 18.04.1 LTS, kernel 4.15.0-42-generic

signature.asc
Description: This is a digitally signed message part

Re: Subtest __E_LIKE_LETTER and __LOWER_E listed many times in message header

Reply via email to