On Fri, 2014-05-23 at 13:47 -0600, Kai Meyer wrote:
> On 05/22/2014 10:36 PM, Kai Meyer wrote:
> > On Fri, 23 May 2014 05:33:31 +0200, Karsten Bräckelmann wrote:

> > > Training as root rather than the system user receiving the mail (and
> > > calling SA) is only possible with site-wide Bayes setup. The pasted
> > > configuration doesn't show that, either, so you would need to train as
> > > the mail receiving / scanning user.
> > 
> > Ya, that was what I was worried about. Just to clarify, postfix runs 
> > as the regular "postfix" user. I'm configured very similar to this:
> > http://www.akadia.com/services/postfix_spamassassin.html
 
> > [...]  My spamd service runs as the user spamd:
> > root      6188     1  0 15:56 ?        00:00:08 /usr/bin/spamd -d -m10 
> > -q -x -u spamd -r /var/run/spamd.pid
> > spamd     6190  6188  0 15:56 ?        00:01:27 spamd child

Given your spamd daemon runs as a dedicated user, you are effectively
using a site-wide Bayes setup. Spamd does not change UID to the user
calling spamc. Other options causing per-user bayes DB are administrator
only.

Thus, bayes training must be done as user spamd.

To get an overview of your current bayes DB used for classifying
incoming messages, run the following command (as user spamd).

  sa-learn --dump magic

Given the terribly low overall scores you mentioned, I wouldn't be
surprised to see it totally biased. In which case starting over fresh
might be in order.


> > So when I run spamassassin manually, I'm using sudo to switch to that 
> > user (cat test.mail.left | sudo -u spamd /usr/bin/spamc -u 
> > k...@gnukai.com > test.mail.right)

Running spamc as the spamd user is unnecessary, if you provide the -u
user option to spamc.

> > So if I turn sa-learn back on, I should make sure that I run it as the 
> > spamd user.

Yes.


> > I think by "setting up rules" I meant "adding configurations for pyzor 
> > and razor2" and the likes. Are they called plugins?

Yep, these are plugins, but that's just a technical detail. What you
meant indeed was "configuration" or "options". In SA, "rules" are all
the tests and patterns to differentiate spam from ham. See the Report
and X-Spam-Status header.


> So it seems that when I find a problem where command-line is scoring it 
> higher, it's always because of the addition of the URIBL_DB_SPAM score. 
> This seems like a "normal" issue then, and I can deal with that.
> 
> However, I'm getting email that is definitely spam, but they are getting 
> negative scores. Should I seek out further configuration help from this 
> list? Or should I enable site-wise bayesian learning? It seems like I've 
> received 10-20 spam messages (about 40% of my usual volume that isn't 
> filtered out of my inbox) in the last 12 hours. Is that considered 
> "reasonable" and I just need to deal with it, or what?

Yes, do start training the Bayesian Classifier by learning both, ham and
spam. As the spamd user.

The number of un-identified spam likely is way too high, unless you're
actually getting thousands per day which are outright rejected at the
SMTP level, and not part of these numbers.

Any SMTP level rejecting based on DNS BLs (e.g. SpamHaus), virus filter,
or high SA score? Any /dev/null'ing of incoming spam? Numbers of ham and
spam per day?


> I'm happy to provide details, but I'm certain that copy-pasting an 
> example spam email to this mailing list wouldn't produce desirable 
> results. I'm perhaps I'm looking for a little hand holding, if anybody 
> has the time. I'd be happy to take this off line, provide http urls to 
> spam emails, ect.

In case there are other issues with your installation, providing samples
enables some helpful list members to run them through SA locally and
compare. Do not send samples to the list, though. Put the raw, original
message up a pastebin and post the link.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to