On 28 Sep 2019, at 13:20, Jerry Malcolm wrote:
On 9/28/2019 9:38 AM, Matus UHLAR - fantomas wrote:
On 28 Sep 2019, at 0:24, Jerry Malcolm wrote:
Understood. I'm definitely stopping and starting the spamd
service. (Although it's called the spamassassin service, it is
definitely starting and stopping spamd.
I've done a ton of digging around. I located:
/usr/lib/systemd/system/spamassassin.service that starts
/usr/bin/spamd using options file /etc/sysconfig/spamassassin and
writes the log to /var/log/maillog.
In the maillog it says it is loading options from
/var/lib/spamassassin/3.004000/updates_spamassassin_org/local.cf
I checked, and that file has required_score 4.0. Yet the rest of
the log file shows scores of x.x/5.0.
So I tried adding an option --cf=required_score 4.0 to the options
file. No change.
Then I tried adding it directly the spamd invocation in the
service file. No matter how many places I tell it I want 4.0.
Something is still overriding it to 5.0. Any other places you
can think of that I can look?
On 9/27/2019 11:49 PM, Bill Cole wrote:
What are the full command line options for spamd?
'ps aux |grep spamd' should tell you the ground truth.
On 28.09.19 00:21, Jerry Malcolm wrote:
With my extra parameter added....
/usr/bin/perl -T -w /usr/bin/spamd --pidfile /var/run/spamd.pid -D
-d -c -m5 -H --cf=required_score 4.0
the "required_score 4.0" should be enclosed in quotes of
apostrophoes.
Or, in config file.
further, the empty -H changes how configs are used:
"By specifying no argument, spamd will use the spamc caller's
home directory
instead."
so, the calling user $HOME/.spamassassin/user_prefs is used
Matus,
Apparently, the whole problem was the quotes. I added the quotes to
the command line options, and it finally works. I didn't try adding
quotes in the local.cf file. But it makes sense. Note though, that
the commented "required_score" line in the shipped version of local.cf
does not have quotes. Perhaps quotes should get added to that file
in the distribution if they are required.
They are not required in a config file. They are only required on a
command line.
So now at least I know how to set the threshold.
You've found one way, but there's still the puzzle of which config file
is actually being used by spamd, since you changed the threshold in some
file that was clearly NOT the operative local.cf.
But my original question has spawned a separate discussion of whether
it is the right thing do to change the threshold. I got one
suggestion that, rather than reducing the threshold, I go in and
rework the scoring on all of the rules in order to get my scores for
obvious spam to rank above 5.0. I appreciate all of the work and
knowledge by the SA team and contributors that has gone into refining
the scoring on all of the rules. If I don't have enough background
to correctly lower the threshold, I definitely don't have the
background and experience (or time) to rework the scoring on a
thousand rules.
The default rules, scores, and threshold are not Holy Writ. There is an
automated process backed by human classification of ham and spam corpora
which calculates some rule scores with an assumption of 5 as the
threshold, but I can guarantee that those corpora are not representative
of all mail, of all mail seen by SA, or of all mail handled by any
single system. It is almost certainly true that the SA defaults are not
the best possible fit for any site anywhere, they're just the best
compromise we know how to come up with. In creating rules and
determining whether they are good enough to publish, we have a
substantial bias against false positives, inevitably meaning that SA
will have some false negatives.
Adjusting the threshold is definitely the easiest way to deal with SA
making too many mistakes on one side of the threshold or the other. In
my experience, 4.0 is a reasonable level AFTER you've got Bayes and AWL
or TxRep databases trained.
So the real question is.... why are MY scores on spam apparently lower
than the main population of SA users? I gotta believe that most
users are processing emails just fine with a 5.0 threshold and not
getting tons of uncaught spam. I have added KAM.cf.
Are you sure that your spamd is actually using the KAM.cf rules? I ask
because of the unresolved question of what config files it is actually
using.
But I still a large percentage of spam gets scored between 4 and 5.
I understand that there are a billion different strains of spam and
the spam that user X receives is different that the spam that user Y
receives. But my lower scores seem a bit too consistent for that to
be the only problem.
I've worked with a lot of different mail streams and I think it is
absolutely normal for a site to have that sort of tilt, especially one
with a small number of users.
Just curious you have a set of test cases that have an expected spam
score that I could run through my SA and compare, and maybe isolate
what rules might not be firing for me.
We do not publish test cases because there is really no hope of coming
up with significant coverage in a reasonable number of test cases. The
most common sources of excess false negatives are entirely local issues
such as correctly set *_networks values and having a proper independent
DNS resolver set up so that you can use the "free for most" DNSBL and
URIBL services that block the heaviest users by resolver address.
It is fairly common for people with persistent false negative problems
to ask about them here, usually posting the spam samples to PasteBin to
avoid having messages to the list blocked as spam.
This is going to be an ongoing research problem for me. Not a
show-stopper today. But I would like to understand better about my
situation. I want to use SA as intended.
As a member of the SpamAssassin PMC I think that I'm safe in saying that
the only "as intended" use is "whatever works for your particular
circumstances."
--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)