On 23 Jun 2015, at 0:05, Michael B Allen wrote:

On Mon, Jun 22, 2015 at 10:42 PM, Bill Cole
<sausers-20150...@billmail.scconsult.com> wrote:
On 22 Jun 2015, at 21:45, Michael B Allen wrote:

On Mon, Jun 22, 2015 at 8:01 PM, Reindl Harald <h.rei...@thelounge.net>
wrote:

[root@www .spamassassin]# pwd
/var/log/spamassassin/.spamassassin
[root@www .spamassassin]# ls -la
total 1100
drwx------ 2 spamd spamd    4096 Jun 22 19:42 .
drwx------ 3 spamd spamd    4096 Jun  7 00:41 ..
-rw------- 1 spamd spamd   45056 Jun 22 19:42 bayes_seen
-rw------- 1 spamd spamd 1290240 Jun 22 19:42 bayes_toks
-rw-r--r-- 1 spamd spamd    1869 Jun  7 00:41 user_prefs



i doubt that SA is using the bayes of root
so you just rain the wrong bayes


So with a default install (CentOS 7 in my case and I suspect pretty
much all other systems), bayes will NOT just work by default unless
you explicitly modify /etc/mail/spamassassin/local.cf to tell sa-learn
to use the bayes db owned by spamd
(/var/log/spamassassin/.spamassassin/bayes in my case) and NOT the one
owned by root?

However, I have done this:

bayes_path /var/log/spamassassin/.spamassassin/bayes
bayes_file_mode 0777


Don't do that, ever, on any regular file, on any system that has processes running as more than just root. I know it's in the SA Wiki, but it's an
irresponsible recommendation.

Yeah, I was going to ask about this because it seems to me if the db
is owned by spamd and spamassassin is running as user spamd and
sa-learn is running as root then 0600 should be fine (although it's
not obvious to me why SA needs a "file mode" in the first place).

A diversity of rigs. SA isn't the spamd daemon or the spamc client or $PERLLIB/Mail/SpamAssassin.pm or the configured ruleset, it is the whole tree of Perl modules in the Mail::SpamAssassin namespace plus *maybe* spamd/spamc, the rules, and subsidiary utilities using them like sa-learn. Different sites use the Perl framework and tools in different ways, so they need different ownership & permission settings. As I don't use SpamAssassin on CentOS (or RHEL) I'm not sure precisely what the default SA rig there looks like, and how (if at all) RedHat has hooked it into Postfix(?) so I can't explain much about the specifics of what you get from 'yum install spamassassin'.

More specifically: in its simplest form, SA is designed to be used by each of many unprivileged users with independent Bayes DBs fed & used by local mail delivery and pre-delivery filtering processes and sa-learn or an equivalent tool for learning messages post-delivery. The bayes_file_mode defaults to 0700 and usually need not be changed, but on some OS's with some mail subsystems it may be necessary to adjust that to allow a delivery agent or other component (e.g. filtering tools) running as something other than root OR the individual mail recipient to read or maybe even write to the users' individual Bayes DBs. You should NOT need it changed on a system that only uses a system-wide Bayes DB.

So then what do you recommend that the bayes_file_mode value be precisely?

The default is usually fine. That's why it is the default. Note that this value is only applied when creating a new file in the Bayes DB (which is composed of multiple files) so it is possible for the effects of changing it to be delayed. If RedHat's packaging of SpamAssassin includes a different value, I'd suggest not changing it. Also, moving your DB into /var/log/spamassassin/ is a quirky choice that might not be compatible with RedHat's integration choices in the package they distribute (and which CentOS replicates.) It's your system and your choices of course...

At any rate, the whole thing seems to be working now incidentally. I
am getting BAYES_XX tags now.

Yes. As documented, you don't get messages scored by the Bayes component until it has built an adequate learned history of both ham and spam to do valid scoring.

As stated in my other followup message,
SA seems to have detected the broken db and fixed it because it
suddenly just stated working and sa-learn --dump magic works and is
showing the right numbers.

Well, I'm not convinced that's exactly how it worked, but I'm glad you seem to have it working.

Note that 'sa-learn' DOES NOT talk to spamd, it uses the SA config that it finds for the user running it to figure out which rules it should use and where to find the Bayes DB (and AWL or TxRep DBs) for that user. If you have spamd running to use a system-wide config/ruleset/Bayes/(AWL|TxRep) you should get in the habit of using spamc to communicate with the daemon rather than running sa-learn as root and relying on a quirky config to assure that you are handling DB files that are global and owned by the right non-root user. If in doing that you cause the creation of a file in the DB that is owned by root and can't be deleted by spamd, your DB will be broken.


So just for posterity, the problem was I just needed "bayes_path
/var/log/spamassassin/.spamassassin/bayes" in local.cf to make
sa-learn use that db instead of /root/.spamassassin/bayes. Looks like
it choked initially but somehow it's working now.

Yeah, that seems like a very wrong solution. Not saying it didn't work for you, but it would not be my choice. Since you seem set on having a weird place for your DB, I won't argue the issue.


Everything is installed as user / group spamd and postfix is set to
call spamassassin with user=spamd. And I assume I must run sa-learn as
root so that it can access Maildir directories and that bayes_path
tells sa-learn where the db is. So now what's the problem?


Wrong assumption.

The sa-learn program is for anyone to manually work with their own Bayes DB, including for the owner of a system-wide Bayes DB to work with that Bayes
DB. If you have a system-wide Bayes DB, it should be fed by either a
system-wide filtering mechanism operating as part of the delivery process and running as the owner of the global DB or by users running the spamc client under their own ids to feed a spamd daemon running as the owner of the global DB or by a combination of the two. The CentOS 7 package installs spamd and spamc, and if you want to learn already-delivered mail into a
global BayesDB, those are the tools to use.

Yes, I want a system-wide bayes db. And I am running spamd and spamc
and I assume that is all working (but of course I have no idea if it
really is).

But I want users to be able to put spams that get through into
~/Maildir/.LearnAsSpam and then, every once in a while, I want to run
sa-learn on all of those messages for the system-wide db.

So can that be done without running sa-learn as root?

Of course. As I said in other words that you quoted but apparently misunderstood:

***** sa-learn IS NOT THE RIGHT TOOL FOR LEARNING MESSAGES INTO A SYSTEM-WIDE DB ****

Use 'spamc -L (spam|ham)'. Have users run it if they like, or have it run as the user whose magic maildirs are being learned. It talks to the spamd daemon, running as the spamd user, managing the system-wide Bayes DB. If it isn't run as root, it can't do random violence limited only by your capacity for typos.

Ideally I would think sa-learn should be able to run as root just to
access files but use a spamd child to process them and update the
bayes db. Possible?

That's not how any of this works...

The reason for the 'd' in spamd is that it is a daemon: a long-running process that other processes (or network entities) can talk to via a local unix socket in the filesystem or a TCP port using a defined protocol. The sa-learn program is not a client of spamd speaking that protocol but rather a direct manipulator of the BayesDB, just as spamd is. You can usually get away with using sa-learn to work with the same BayesDB that spamd uses, but you are likely to eventually do something a little wrong and either screw up the BayesDB with a file spamd can't write to or accidentally and blindly work with a brand new different BayesDB because of some environmental change or you've re-installed SA or whatever. I don't think there's a real risk of deadlock or data corruption or anything like that from using spamd and sa-learn on the same DB, but you do have 2 tools that are unaware of each other potentially trying to write to the same files, so there is at least some possibility for contention problems. And as you've noticed: to learn messages in anyone's maildirs itnot the system BayesDB, you have to run sa-learn as root, because it isn't talking to spamd at all but fiddling with spamd's file behind spamd's back. Running things as root should be resisted and avoided. Use spamc instead, avoid the risks.


--
Bill Cole Email: b...@scconsult.com 18847 Rosetta Ave. USE THE FROM HEADER IF IT DIFFERS! Eastpointe, MI USA 48021 MAIN ADDRESS IS HEAVILY SPAM-FILTERED!
Phone: +1-586-774-4357

Reply via email to