Re: Weird characters (again) getting around filter rules.

Martin F via users Tue, 16 Dec 2025 03:05:47 -0800

I've had to deal with quite a bit of obfuscated spam over the years.

I started out having every possible obfuscation in every rule, andwhenever i discovered a new one, i needed to go back and update everysingle rule with the new one. The rules were massive and completelyunreadable.Then i discovered replace_tags, which i can highly recommend lookinginto, if you haven't already:

https://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin_ReplaceTags.html
https://github.com/apache/spamassassin/blob/trunk/rules/25_replace.cf

Using this made the rules so much easier to read when you come back tothem 6 months from now, and it's much easier to reuse the sameobfuscations. Just update it in one place and it applies to all rulesusing them.(Sorry, that sounded like a horrible sales-pitch from a TV-advertisementor something..)

I've found the builtin rules are occasionally missing some specialcharacters, so i made a replace_tag for every letter where i include thebuilt-in one. Here's a couple of examples:

replace_tag        CUSTOM_C            (<C>|\xe1\xb4\x84)
replace_tag        CUSTOM_N (<N>|\xe2\x93\x9d|\xc6[\x9e\x9d]|\xef\xbd\x8e)
replace_tag        CUSTOM_V            (<V>)

Then i can add other custom characters i find to each letter there, ifthe built-in rules are not catching the obfuscation.

I've found the easiest way to get the characters is a quick python for-loop:
>>> for c in "ṣҿṽҿral":
...     print(f"{c}: {c.encode('utf8')}")
...
ṣ: b'\xe1\xb9\xa3'
ҿ: b'\xd2\xbf'
ṽ: b'\xe1\xb9\xbd'
ҿ: b'\xd2\xbf'
r: b'r'
a: b'a'
l: b'l'

In the end, you can make either one rule that catches both the normaland obfuscated versions, or separate them so you can punish obfuscatedversions even harder:body __BODY_VIAGRA/(^|[^a-zA-Z0-9\.]|<CUSTOM_WORD_SEP>)viagra([^a-zA-Z0-9]|$)/ibody __BODY_VIAGRA_OBF/(^|[^a-zA-Z0-9]|<CUSTOM_WORD_SEP>)(?!\bviagra\b)<CUSTOM_V><CUSTOM_I><CUSTOM_A><CUSTOM_G><CUSTOM_R><CUSTOM_A>([^a-zA-Z0-9]|$)/i

replace_rules    __BODY_VIAGRA __BODY_VIAGRA_OBF

I would say start out with the built-in ones from the 25_replace.cffile, and if you see they're not catching certain characters, startcreating your own versions and add those characters.

As others have pointed out, it might cause issues if you actually havepeople writing in languages that use those special characters, butthat's the eternal joy of managing a spam-filter..



On 12/15/25 2:04 AM, Mark London wrote:

Hi - One of users got a bitcoin blackmail email, that use specialcharacters to avoid the bitcoin spam rules. Does anybody have rulesthat detect this type of obfuscation? Thanks. - Mark
Begin forwarded message:
*From:* Ashley Adkins <[email protected]>
*Date:* December 12, 2025 at 3:51:30 PM EST
*Subject:* *Reminder! Check this message now*


Greetings!

I nҿҿd to inform bad nĕwṣ with you.
Approximately ṣҿṽҿral monthṡ ago I obtainễd accȩṡṡ to your gadgễtŝ,which you uṩẽ for wҿb _(krxvtgqb) _ṣurfing. Aftҿr that, I _(qofyata)_haⱱê ṥtartȅd tracking your intẹrnẹt activities.
Here iṩ thḗ ṣȇquȇncȇ of events:

--
Martin Flygenring (maf)
Systems Engineer, group.one / one.com

Re: Weird characters (again) getting around filter rules.

Reply via email to