Alex, from prypiat.
Yes, I recycle.

On 12-10-25 03:04 PM, dar...@chaosreigns.com wrote:
> On 10/25, Bowie Bailey wrote:
>> On 10/25/2012 10:47 AM, Simon Loewenthal wrote:
>>> *  2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'
>>>
>>> Does anyone know the rational behind this, or is our user base simply 
>>> communicating on a higher level?  :)  I imagine the rational is sound, but 
>>> I do not know what it is.
>> The rationale is simple.  The masscheck finds that this rule hits
>> more spam than ham, so it gets a higher score.
> It's slightly more complicated than that.  It's that this score results in
> the maximum spams flagged as spam without exceeding 1 false positive in
> 2,500 non-spams.
>
> A fun example is SUBJ_YOUR_DEBT, which was getting a score of 3.0 while
> hitting more non-spam than spam.  I guess it got disabled somehow.
>
>
> But more importantly, it's because we do not have have the rule
> hit statistics from your email to include them in optimal score
> generation because you're not submitting those stats via masscheck:
> https://wiki.apache.org/spamassassin/NightlyMassCheck
>
>
> RuleQA results for that rule are here:
> ruleqa.spamassassin.org/?daterev=20121020&rule=DEAR_SOMETHING
>
>   MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
>       0   0.6160   0.2324   0.726    0.63    2.00  DEAR_SOMETHING  
>
> It hits 0.6% of spam, and 0.2% of non-spam (ham).
>
>
> On 10/25, Alexandre Boyer wrote:
>> Simon, I had some FPs because of this rule and because my threshold is
>> lower than 5.
> If you could just append "and I know this is highly discouraged"
> any time you say that, you might reduce my need to point it out to
> avoid you causing other people to think that might be a good idea.
> Scores are generated with a threshold of 5. It's often recommended to
> use a threshold above 5 for an extra safety measure.  Do you even have a
> guess what rate of false positives your causing with a lower threshold?
> I don't.
>

Well, discouraged was implicit (as is the fact that every admin is
responsible for it's own config) but I will remember to precise this
disclaimer should I mention this point again.

I know that my threshold is not good; I'm not satisfied with it but I
inherited this config from previous admins and all my maps (with
personal rules and score overrides) are "computed" with a threshold of 4.

It's not what I want, but I have to do with it. Note that may be one day
I will have SA to work as it should. To answer your question, I don't
have so many FPs because SA is not the only engine used on my systems,
I've a bunch of other filters running. My catch rate is arround 99.8%
and my FP rate is between 0.0001% and 0.03% (this is computed by an
independent source, but I have approximately the same internal stats on
my ham/spam feeds) on approximately 1.5 - 2 million messages a day.

Going a little bit more off topic now:

The SA rules scores are computed based on the mass-checks, from the
project and, to some extend, from contributors. A good question is: how
many contributors really give a feedback on the mass-checks?

This is something I do not know, but the fewer they are, the greater the
bias is. Bias in spam and ham samples. Emails reaching my servers are
different from yours and from each and every SA users.

Unless everybody on earth run a nightly mass-check and report results to
SA project for it to compute a "world wide" scoring, there is a bias. At
least this is my understanding, may be I'm wrong, please correct me if so.

For example, I'm in the process of learning to use mass-check to
contribute back to SA (which implies a lot of hard work, simply to build
and maintain valid ham/spam corpora, use mass-check, then hit-freq, then
fp-fn-stat, I'm not even close to understand how to compute a re-score.
And the doc is, when available, not really clear about this); but even
with this, I'm not sure my contribution would be sufficient to make SA
scores to be closer to my email traffic reality.

Do you have any stat about how many contributors are giving a feedback
on the masscheck? and about their geographical location? I'm just asking
because I was not able to find this kind of information anywhere.

>> I just had a score override to lower it but this rule still hist a lot
>> of spam (419 scams essentially).
> Yup, nothing wrong with customizing your rules to suit the email you get
> better.  At least in the direction of reducing false positives.  
>

Reply via email to