Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Sebastian Arcus Sun, 08 Apr 2018 00:53:15 -0700


On 07/04/18 17:14, Reindl Harald wrote:



Am 07.04.2018 um 18:10 schrieb Sebastian Arcus:

And the way I customise the scores are based on the type of emails
received at this particular site. It might seem "idiotic" to you, but
there are reasons for those scores. Not everyone receives the same mix
of email - so it isn't constructive to start calling other people's
scoring "idiotic" just because they are not the same as your own or the
defaults

if a single misfired rule make a BAYES_00 message to a spam message it's
idiotic - it's that easy - with or without MSGID_SPAM_CAPS that can
happen at every moment in time and when you trust your bayes -0.2 is not
justified and if you don't trust your bayes train it

A default score of 3.1 for MSGID_SPAM_CAPS is pretty high - evencompared with some of the DNS blacklists rules - and some of those arepretty powerful INMHO. Hence why I was trying to understand why thisrule is assigned such a high score and what is the significance of it.

Secondly, I found in the past that a high negative score for BAYES_00 iscounter-productive, because:

1. As soon as you receive a spam message with a new type of content, itessentially has a free ride until it gets put through the bayes training- as the high negative on BAYES_00 counteracts any other rule it hits -even pretty effective rules, such as Pyzor and blacklists.

2. Spammers have learned from the above, and I get a lot of spam whichchanges the wording all the time, so that bayes becomes essentiallyineffective against it - but at the same time it stops other rules fromworking - because of the high negative scores on low BAYES.

3. Spammers have also learned from no.1 , and I see a lot of extremelyshort spam messages - just one short line of few words. Bayes seems tobe extremely ineffective on these very short messages, not matter howmuch you train it - because of the small amount of data to work on, andwith a little bit of cunning and varying the words used - they all scoreas BAYES_00. Again, the high negative score gives these spammers aguaranteed free ride, as it overrides any other rules.

So at least from the type of spam that I see, BAYES_00 with a largenegative score is really counter-productive and it makes SA far lessefficient at picking spam.

BAYES_00 doesn't necessarily mean "I am sure this is not spam" - as agood quality whitelist rule would, for example. It merely means "Ihaven't really seen this type of spam before", or simply "this messageis too short and I really can't say anything useful about it". For thesereasons, I don't think low BAYES scores should be given large negativescores - and hence why I changed them on my systems - with really goodresults.

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

Reply via email to