Re: False Positive on SUBJECT_FUZZY_TION rule

Ned Slider Tue, 30 Sep 2008 16:15:46 -0700

Ned Slider wrote:

Hi List,
I'm getting some FP hits against the SUBJECT_FUZZY_TION rule in25_replace.cf (SA 3.2.5, latest update):
header SUBJECT_FUZZY_TION       Subject =~ /<post P3>(?!tion)<T><I><O><N>/i
describe SUBJECT_FUZZY_TION     Attempt to obfuscate words in Subject:
replace_rules SUBJECT_FUZZY_TION


is hitting on ham from a mailing list with the following subject line:

Subject: Re: [CentOS] mount UFS partition on CentOS 5.
My regex isn't good enough to understand exactly what this rule istrying to achieve, but it looks to me like some kind of obfuscation of"tion" within a word, but it appears to be hitting on "partition" inthis case to my untrained eye. A test email containing just the text"partition" in the subject line also hits this rule so would appear toconfirm my assumptions.
Could anyone help me understand what this rule is designed to hit, andwhy it's hitting in this case?
Thanks.



Replying to my own thread...

I'm assuming this rule is interpreting "tition" as an obfuscation of"tion" hence why it hits against "partition" as if it were anobfuscation of "partion".

Looking at some very crude stats for this rule against a recent corpusof ~1700 ham and ~1800 spam on my server, I see 13 FP hits against hamand only 1 hit against spam (an obfuscation of erection). Admittedly myham corpus was a technical mailing list likely to contain the term"partition" given it's common usage within IT and triggering of the rulein no way got close to tagging any ham as spam.

Anyway, to me this rule doesn't appear to represent good value so I'llprobably just adjust the score to 0.001 and monitor it unless someonecan suggest a method to prevent it hitting against legitimate words suchas partition.

Re: False Positive on SUBJECT_FUZZY_TION rule

Reply via email to