RW <rwmaillists <at> googlemail.com> writes: > On Wed, 18 Feb 2009 23:00:03 +0000 (UTC) > Ray <rsk-gmane <at> misinformation.org> wrote: > > * How do I determine what the current SA config is? > > The locations where spamassassin looks for configuration are listed in > the main manpage.
I managed to find the config directory on this system, thanks for the pointer. I guess I have to parse all of these files to know how SA is actually config'd? Alas, I was hoping for something like Postfix's `postconf` to show the active/final configuration in its entirety. Where can one submit a feature request, and does this sound like a sensible one? > If it appears to autolearning, then bayes and autolearning are enabled. The magically incrementing `sa-learn --dump magic|grep am` values suggest so. It's odd that there isn't any indication in the "X-Spam-Status" header that this is happening, as one would expect after reading the wiki article AutolearnNotWorking. > Note that autolearning uses its own, more conservative, rules, it's not > based on the normal single threshold - you should use sa-learn to > manually train too, if you can. I noticed the additional thresholds for autolearning. I was hoping to do manual training only, but maybe that level of control is just not achievable in my circumstance. (The problem being that my headers may be bad for sa-learn.) > By default Bayes scoring wont turn-on until you've learned 200 spam, > and 200 ham (non-spam) messages. If you are going to make a judgement > about moving the threshold then you should ignore the early mails that > lack BAYES_* hits. I imagine after Bayes scoring goes into effect I'll have a nicer distribution of scores (pushed towards the poles). > > * Can I stop SA from judging spamminess (that is, making the binary > > declaration of whether something is spam, X-Spam-Status, > > X-Spam-Flag) and retain the scoring markup? I suppose this may not > > be important, as sa-learn is said to ignore prior SA markup, it's > > just that having the declaration sitting in the headers from there on > > makes these mails look spammy whether they truly are, and other more > > naive tools might be misled. > > Some third-party Baysian filters let you you ignore unwanted headers. I think this response might mean that I can't stop SA adding X-Spam-Status and/or X-Spam-Flag, as the response proceeds without answering the question directly. I would like to have just the scoring without the judgements, but I suppose again this is not an issue with regards to future application of sa-learn. The only other markup I feel it's actually necessary to hinder is the subject markup. > Even if you use one that doesn't, a single spam/ham token isn't likely > to have all that much effect compared to all the other SA tokens. There That's reassuring. > are two main ways to use SA with a separate Bayesian filter. One is to > score it into SA (which you can't do) and the other is to let the > Bayesian filter pick-up extra tokens from the SA headers. In the latter > case you would probably want to leave in the result at the default > threshold anyway. And I or another person shouldjust remember while looking at these emails that the judgement is not necessarily correct. I guess I'm including myself (and other humans) among the naive tools to worry about. > I think you could get rid of it by creating a custom header, but it's > probably not worth the effort. "It" here referring to the final spamminess judgement? Oh, sorry, I misunderstood earlier, then. > > * If I can't stop SA from judging spamminess, can I at least > > override the site-wide config to mark up subjects? I can't figure > > this out. Currently I have 'rewrite_header subject ""', but that > > fails. The docs say the string should be set to 'a null value', but > > the config file's syntax for specifying nulls is not described. > > I believe it just means: > > rewrite_header subject Ah, that's one of the permutations I tried. Any idea why it may not have worked? I've been able to modify required_score, as is evidenced by mail headers that come through, so I must be working in a picked-up config file. (Again a `sa-conf` to view live/final config would be much better for me than tweaking my user config file's required_score and then waiting for a spam to arrive so I can know if a config specification went into effect.) My only guess now is that somehow site-wide config overrides user config for this item or that user config for this item is disallowed. Right now SA's config'd to prepend "***SPAM*** ". But I don't see this string or the string "rewrite_header" anywhere in the discovered config directory or my user config. So this is part of what I meant earlier about how my headers may be bad. The subjects. I can't imagine SA would know how to ignore this prepended string -- could it? The wiki article LearningMarkedUpMessages suggests that "Subject header tagged etc." are automatically removed for learning, but the POD says ``Note that you should only use the _REQD_ and _SCORE_ tags when rewriting the Subject header if "report_safe" is 0.'' I believe the site-wide config here is set to 0 judging by the shape of the mails that arrive in my inbox (on another system). The other part of how my headers may be bad is that all the mail is getting forwarded to another host where I read it. Surely the additional header lines are part of why there's a warning in the wiki not to "train Bayes on different mail streams"? But maybe it's possible to strip the distinguishing headers? Ah, maybe the solution is to set report_safe back to the default of 1? That seems like it would solve all these problems. Does that sound right? Oh, hell, it looks like I am not allowed to set either rewrite_subject or report_safe. :( Any advice? (Other than complain to system admins?) Thanks, RSK