On 7/28/23 00:23, Bill Cole wrote:
1. There are milters/content-filters that decode Base64 message parts
(amavisd-new, mimedefang, etc) for processing by SA.
2. There are still sufficiently unique items: First-Name-Only,
Mixed-Case word in the Subject (NLP modeling), and a Base-64 encoded
HTML attachment (w/ UTF-8 encoding no less). Combined in a Meta
rule, these innocuous items will likely hit with good accuracy even
without Base64 decoding.
Umm, unless I'm really missing something here the usual SA processing
decodes such body stuff (QP, Base64, etc) and feeds the "cleaned"
text to the rule processing engine.
Correct. It has nothing to do with the calling glue.
You have to work hard to get matches done on the raw stuff if you
want to do special rule matching on the un-decoded body.
Correct. That should only be needed in rare cases where you're looking
for a pattern in a non-text part.
I'm not sure why the OP's rule didn't match the target message, but it
is NOT because of the Base64 encoding of parts with the 'text' primary
MIME type. If I had to guess, I'd look for invisible characters hidden
in the text (e.g. Unicode "zero width non-joiner" marks and the like)
that break the pattern and for lookalike non-ASCII characters (often
Cyrillic or Greek) in the target string.
I am seeing the same issue. I get those same emails, with that
132.1532.1334 string or similar. SA is definitely not catching them,
even though I dump them into my spam folder and run sa-learn --spam
against them day after day. How can I check to see if it's actually
decoding the base64? Or is that just a fact? It seems incredibly weird
that I get these things every day, I mark them as spam every day, and
they never hit more than a couple of points on the spam scale.
Thomas