On 28 Sep 2019, at 13:20, Jerry Malcolm wrote:

On 9/28/2019 9:38 AM, Matus UHLAR - fantomas wrote:
On 28 Sep 2019, at 0:24, Jerry Malcolm wrote:
Understood.  I'm definitely stopping and starting the spamd service. (Although it's called the spamassassin service, it is definitely starting and stopping spamd.

I've done a ton of digging around.  I located:

/usr/lib/systemd/system/spamassassin.service that starts /usr/bin/spamd using options file /etc/sysconfig/spamassassin and writes the log to /var/log/maillog.

In the maillog it says it is loading options from /var/lib/spamassassin/3.004000/updates_spamassassin_org/local.cf

I checked, and that file has required_score 4.0.  Yet the rest of the log file shows scores of x.x/5.0.

So I tried adding an option --cf=required_score 4.0 to the options file.  No change.

Then I tried adding it directly the spamd invocation in the service file.  No matter how many places I tell it I want 4.0. Something is still overriding it to 5.0.  Any other places you can think of that I can look?

On 9/27/2019 11:49 PM, Bill Cole wrote:
What are the full command line options for spamd?

'ps aux |grep spamd' should tell you the ground truth.

On 28.09.19 00:21, Jerry Malcolm wrote:
With my extra parameter added....

/usr/bin/perl -T -w /usr/bin/spamd --pidfile /var/run/spamd.pid -D -d -c -m5 -H --cf=required_score 4.0

the "required_score 4.0" should be enclosed in quotes of apostrophoes.
Or, in config file.

further, the empty -H changes how configs are used:

   "By specifying no argument, spamd will use the spamc caller's home directory
          instead."

so, the calling user $HOME/.spamassassin/user_prefs is used

Matus,

Apparently, the whole problem was the quotes.  I added the quotes to the command line options, and it finally works.  I didn't try adding quotes in the local.cf file.  But it makes sense.  Note though, that the commented "required_score" line in the shipped version of local.cf does not have quotes.  Perhaps quotes should get added to that file in the distribution if they are required.

They are not required in a config file. They are only required on a command line.

So now at least I know how to set the threshold. 

You've found one way, but there's still the puzzle of which config file is actually being used by spamd, since you changed the threshold in some file that was clearly NOT the operative local.cf.

But my original question has spawned a separate discussion of whether it is the right thing do to change the threshold.   I got one suggestion that, rather than reducing the threshold, I go in and rework the scoring on all of the rules in order to get my scores for obvious spam to rank above 5.0.  I appreciate all of the work and knowledge by the SA team and contributors that has gone into refining the scoring on all of the rules.  If I don't have enough background to correctly lower the threshold, I definitely don't have the background and experience (or time) to rework the scoring on a thousand rules.

The default rules, scores, and threshold are not Holy Writ. There is an automated process backed by human classification of ham and spam corpora which calculates some rule scores with an assumption of 5 as the threshold, but I can guarantee that those corpora are not representative of all mail, of all mail seen by SA, or of all mail handled by any single system. It is almost certainly true that the SA defaults are not the best possible fit for any site anywhere, they're just the best compromise we know how to come up with. In creating rules and determining whether they are good enough to publish, we have a substantial bias against false positives, inevitably meaning that SA will have some false negatives.

Adjusting the threshold is definitely the easiest way to deal with SA making too many mistakes on one side of the threshold or the other. In my experience, 4.0 is a reasonable level AFTER you've got Bayes and AWL or TxRep databases trained.

So the real question is.... why are MY scores on spam apparently lower than the main population of SA users?  I gotta believe that most users are processing emails just fine with a 5.0 threshold and not getting tons of uncaught spam.  I have added KAM.cf. 

Are you sure that your spamd is actually using the KAM.cf rules? I ask because of the unresolved question of what config files it is actually using.

But I still a large percentage of spam gets scored between 4 and 5.  I understand that there are a billion different strains of spam and the spam that user X receives is different that the spam that user Y receives.  But my lower scores seem a bit too consistent for that to be the only problem.

I've worked with a lot of different mail streams and I think it is absolutely normal for a site to have that sort of tilt, especially one with a small number of users.

Just curious you have a set of test cases that have an expected spam score that I could run through my SA and compare, and maybe isolate what rules might not be firing for me.

We do not publish test cases because there is really no hope of coming up with significant coverage in a reasonable number of test cases. The most common sources of excess false negatives are entirely local issues such as correctly set *_networks values and having a proper independent DNS resolver set up so that you can use the "free for most" DNSBL and URIBL services that block the heaviest users by resolver address.

It is fairly common for people with persistent false negative problems to ask about them here, usually posting the spam samples to PasteBin to avoid having messages to the list blocked as spam.

This is going to be an ongoing research problem for me. Not a show-stopper today.  But I would like to understand better about my situation.  I want to use SA as intended.

As a member of the SpamAssassin PMC I think that I'm safe in saying that the only "as intended" use is "whatever works for your particular circumstances."

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)

Reply via email to