On 1/11/2015 3:24 PM, Marieke Janssen wrote:

Hello all,

In some (apple|bank|creditcard) scam mail I found the following header and made a rule for it.

X-Get-Message-Sender-Via: cpanel3.example.org: authenticated_id: f0829646/only user confirmed/virtual account not confirmed

describe MJ_VACCOUNT virtual account not confirmed

header MJ_VACCOUNT X-Get-Message-Sender-Via =~ m;virtual account not confirmed;

I scored it 1.5, I have no idea how to calculate a proper score, maybe someone cares to explain how to score?


There are two ways to generate scores:

1 - you can get a corpora of ham and spam and use analysis to identify an optimal score
2 - you can use experience (real world and/or guesstimates) to set a score.

And to do #1, it's best to give it a starting point. Which means #2 is needed a bit anyway ;-) So my first step is ALWAYS to check my hand-sorted corporate for anecdotal signs that a rule will classify properly. We call this the S/O. See http://wiki.apache.org/spamassassin/S/O

So if you have a rule that you think has a benefit to others, we can put it in a sandbox that then automatically tests it and promotes it if it looks good. This is the Rule QA system which is working but not reporting on the ruleqa website correctly.

To get back to your question, the score for spam starts with a number based on how likely it is to misfire.

And overall SA is designed as a framework to use LOTS of different rules with some that can misfire but hopefully still classify the email correctly.

So I generally score a rule about 1.0 as a max. BUT I use meta rules a lot and based on the number of metas involve, I may consider each part of the meta allowing me to raise the score considerably.

You can look at my KAM.cf for some guidance. In short, there are times and rules that benefit greatly from analysis and some that you just have to make a guess.

More specifically, your rule above hits 11 items in my SPAM corpora but it also hit 10 in my Ham corpora including a user on the SA mailing list and another to the SA Board mailing list.

This means it has an S/O which means it is worthless for classification because it has roughly equal hits in spam and ham.

Sorry to say but I would recommend scrapping the rule.

Regards,
KAM

Reply via email to