On Fri, 2014-05-23 at 13:47 -0600, Kai Meyer wrote: > On 05/22/2014 10:36 PM, Kai Meyer wrote: > > On Fri, 23 May 2014 05:33:31 +0200, Karsten Bräckelmann wrote:
> > > Training as root rather than the system user receiving the mail (and > > > calling SA) is only possible with site-wide Bayes setup. The pasted > > > configuration doesn't show that, either, so you would need to train as > > > the mail receiving / scanning user. > > > > Ya, that was what I was worried about. Just to clarify, postfix runs > > as the regular "postfix" user. I'm configured very similar to this: > > http://www.akadia.com/services/postfix_spamassassin.html > > [...] My spamd service runs as the user spamd: > > root 6188 1 0 15:56 ? 00:00:08 /usr/bin/spamd -d -m10 > > -q -x -u spamd -r /var/run/spamd.pid > > spamd 6190 6188 0 15:56 ? 00:01:27 spamd child Given your spamd daemon runs as a dedicated user, you are effectively using a site-wide Bayes setup. Spamd does not change UID to the user calling spamc. Other options causing per-user bayes DB are administrator only. Thus, bayes training must be done as user spamd. To get an overview of your current bayes DB used for classifying incoming messages, run the following command (as user spamd). sa-learn --dump magic Given the terribly low overall scores you mentioned, I wouldn't be surprised to see it totally biased. In which case starting over fresh might be in order. > > So when I run spamassassin manually, I'm using sudo to switch to that > > user (cat test.mail.left | sudo -u spamd /usr/bin/spamc -u > > k...@gnukai.com > test.mail.right) Running spamc as the spamd user is unnecessary, if you provide the -u user option to spamc. > > So if I turn sa-learn back on, I should make sure that I run it as the > > spamd user. Yes. > > I think by "setting up rules" I meant "adding configurations for pyzor > > and razor2" and the likes. Are they called plugins? Yep, these are plugins, but that's just a technical detail. What you meant indeed was "configuration" or "options". In SA, "rules" are all the tests and patterns to differentiate spam from ham. See the Report and X-Spam-Status header. > So it seems that when I find a problem where command-line is scoring it > higher, it's always because of the addition of the URIBL_DB_SPAM score. > This seems like a "normal" issue then, and I can deal with that. > > However, I'm getting email that is definitely spam, but they are getting > negative scores. Should I seek out further configuration help from this > list? Or should I enable site-wise bayesian learning? It seems like I've > received 10-20 spam messages (about 40% of my usual volume that isn't > filtered out of my inbox) in the last 12 hours. Is that considered > "reasonable" and I just need to deal with it, or what? Yes, do start training the Bayesian Classifier by learning both, ham and spam. As the spamd user. The number of un-identified spam likely is way too high, unless you're actually getting thousands per day which are outright rejected at the SMTP level, and not part of these numbers. Any SMTP level rejecting based on DNS BLs (e.g. SpamHaus), virus filter, or high SA score? Any /dev/null'ing of incoming spam? Numbers of ham and spam per day? > I'm happy to provide details, but I'm certain that copy-pasting an > example spam email to this mailing list wouldn't produce desirable > results. I'm perhaps I'm looking for a little hand holding, if anybody > has the time. I'd be happy to take this off line, provide http urls to > spam emails, ect. In case there are other issues with your installation, providing samples enables some helpful list members to run them through SA locally and compare. Do not send samples to the list, though. Put the raw, original message up a pastebin and post the link. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}