Re: Setting Threshold (Resolved)

Bill Cole Sat, 28 Sep 2019 13:34:30 -0700

On 28 Sep 2019, at 13:20, Jerry Malcolm wrote:

On 9/28/2019 9:38 AM, Matus UHLAR - fantomas wrote:
On 28 Sep 2019, at 0:24, Jerry Malcolm wrote:
Understood. I'm definitely stopping and starting the spamdservice. (Although it's called the spamassassin service, it isdefinitely starting and stopping spamd.
I've done a ton of digging around.  I located:
/usr/lib/systemd/system/spamassassin.service that starts/usr/bin/spamd using options file /etc/sysconfig/spamassassin andwrites the log to /var/log/maillog.
In the maillog it says it is loading options from/var/lib/spamassassin/3.004000/updates_spamassassin_org/local.cf
I checked, and that file has required_score 4.0. Yet the rest ofthe log file shows scores of x.x/5.0.
So I tried adding an option --cf=required_score 4.0 to the optionsfile. No change.
Then I tried adding it directly the spamd invocation in theservice file. No matter how many places I tell it I want 4.0.Something is still overriding it to 5.0. Any other places youcan think of that I can look?
On 9/27/2019 11:49 PM, Bill Cole wrote:
What are the full command line options for spamd?

'ps aux |grep spamd' should tell you the ground truth.
On 28.09.19 00:21, Jerry Malcolm wrote:
With my extra parameter added....
/usr/bin/perl -T -w /usr/bin/spamd --pidfile /var/run/spamd.pid -D-d -c -m5 -H --cf=required_score 4.0
the "required_score 4.0" should be enclosed in quotes ofapostrophoes.
Or, in config file.

further, the empty -H changes how configs are used:
"By specifying no argument, spamd will use the spamc caller'shome directory
          instead."

so, the calling user $HOME/.spamassassin/user_prefs is used
Matus,
Apparently, the whole problem was the quotes. I added the quotes tothe command line options, and it finally works. I didn't try addingquotes in the local.cf file. But it makes sense. Note though, thatthe commented "required_score" line in the shipped version of local.cfdoes not have quotes. Perhaps quotes should get added to that filein the distribution if they are required.

They are not required in a config file. They are only required on acommand line.

So now at least I know how to set the threshold.

You've found one way, but there's still the puzzle of which config fileis actually being used by spamd, since you changed the threshold in somefile that was clearly NOT the operative local.cf.

But my original question has spawned a separate discussion of whetherit is the right thing do to change the threshold. I got onesuggestion that, rather than reducing the threshold, I go in andrework the scoring on all of the rules in order to get my scores forobvious spam to rank above 5.0. I appreciate all of the work andknowledge by the SA team and contributors that has gone into refiningthe scoring on all of the rules. If I don't have enough backgroundto correctly lower the threshold, I definitely don't have thebackground and experience (or time) to rework the scoring on athousand rules.

The default rules, scores, and threshold are not Holy Writ. There is anautomated process backed by human classification of ham and spam corporawhich calculates some rule scores with an assumption of 5 as thethreshold, but I can guarantee that those corpora are not representativeof all mail, of all mail seen by SA, or of all mail handled by anysingle system. It is almost certainly true that the SA defaults are notthe best possible fit for any site anywhere, they're just the bestcompromise we know how to come up with. In creating rules anddetermining whether they are good enough to publish, we have asubstantial bias against false positives, inevitably meaning that SAwill have some false negatives.

Adjusting the threshold is definitely the easiest way to deal with SAmaking too many mistakes on one side of the threshold or the other. Inmy experience, 4.0 is a reasonable level AFTER you've got Bayes and AWLor TxRep databases trained.

So the real question is.... why are MY scores on spam apparently lowerthan the main population of SA users? I gotta believe that mostusers are processing emails just fine with a 5.0 threshold and notgetting tons of uncaught spam. I have added KAM.cf.

Are you sure that your spamd is actually using the KAM.cf rules? I askbecause of the unresolved question of what config files it is actuallyusing.

But I still a large percentage of spam gets scored between 4 and 5. I understand that there are a billion different strains of spam andthe spam that user X receives is different that the spam that user Yreceives. But my lower scores seem a bit too consistent for that tobe the only problem.

I've worked with a lot of different mail streams and I think it isabsolutely normal for a site to have that sort of tilt, especially onewith a small number of users.

Just curious you have a set of test cases that have an expected spamscore that I could run through my SA and compare, and maybe isolatewhat rules might not be firing for me.

We do not publish test cases because there is really no hope of comingup with significant coverage in a reasonable number of test cases. Themost common sources of excess false negatives are entirely local issuessuch as correctly set *_networks values and having a proper independentDNS resolver set up so that you can use the "free for most" DNSBL andURIBL services that block the heaviest users by resolver address.

It is fairly common for people with persistent false negative problemsto ask about them here, usually posting the spam samples to PasteBin toavoid having messages to the list blocked as spam.

This is going to be an ongoing research problem for me. Not ashow-stopper today. But I would like to understand better about mysituation. I want to use SA as intended.

As a member of the SpamAssassin PMC I think that I'm safe in saying thatthe only "as intended" use is "whatever works for your particularcircumstances."


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)

Re: Setting Threshold (Resolved)

Reply via email to