On 7/28/23 00:23, Bill Cole wrote:
1. There are milters/content-filters that decode Base64 message parts (amavisd-new, mimedefang, etc) for processing by SA.
2.  There are still sufficiently unique items: First-Name-Only, Mixed-Case word in the Subject (NLP modeling), and a Base-64 encoded HTML attachment (w/ UTF-8 encoding no less).  Combined in a Meta rule, these innocuous items will likely hit with good accuracy even without Base64 decoding.

Umm, unless I'm really missing something here the usual SA processing decodes such body stuff (QP, Base64, etc) and feeds the "cleaned" text to the rule processing engine.

Correct. It has nothing to do with the calling glue.

You have to work hard to get matches done on the raw stuff if you want to do special rule matching on the un-decoded body.

Correct. That should only be needed in rare cases where you're looking for a pattern in a non-text part.

I'm not sure why the OP's rule didn't match the target message, but it is NOT because of the Base64 encoding of parts with the 'text' primary MIME type. If I had to guess, I'd look for invisible characters hidden in the text (e.g. Unicode "zero width non-joiner" marks and the like) that break the pattern and for lookalike non-ASCII characters (often Cyrillic or Greek) in the target string.

I am seeing the same issue. I get those same emails, with that 132.1532.1334 string or similar. SA is definitely not catching them, even though I dump them into my spam folder and run sa-learn --spam against them day after day. How can I check to see if it's actually decoding the base64? Or is that just a fact? It seems incredibly weird that I get these things every day, I mark them as spam every day, and they never hit more than a couple of points on the spam scale.

Thomas

Reply via email to