Re: config no subject rewrite, learning spam headers

Ray Fri, 20 Feb 2009 23:53:22 -0800

RW <rwmaillists <at> googlemail.com> writes:

> On Wed, 18 Feb 2009 23:00:03 +0000 (UTC)
> Ray <rsk-gmane <at> misinformation.org> wrote:
> > * How do I determine what the current SA config is?
>
> The locations where spamassassin looks for configuration are listed in
> the main manpage.


I managed to find the config directory on this system, thanks for the pointer.
I guess I have to parse all of these files to know how SA is actually config'd?
Alas, I was hoping for something like Postfix's `postconf` to show the
active/final configuration in its entirety.

Where can one submit a feature request, and does this sound like a sensible one?

> If it appears to autolearning, then bayes and autolearning are enabled.

The magically incrementing `sa-learn --dump magic|grep am` values suggest so.
It's odd that there isn't any indication in the "X-Spam-Status" header that this
is happening, as one would expect after reading the wiki article
AutolearnNotWorking.

> Note that autolearning uses its own, more conservative, rules, it's not
> based on the normal single threshold - you should use sa-learn to
> manually train too, if you can.

I noticed the additional thresholds for autolearning.  I was hoping to do manual
training only, but maybe that level of control is just not achievable in my
circumstance.  (The problem being that my headers may be bad for sa-learn.)

> By default Bayes  scoring wont turn-on until you've learned 200 spam,
> and 200 ham (non-spam) messages. If you are going to make a judgement
> about moving the threshold then you should ignore the early mails that
> lack BAYES_* hits.

I imagine after Bayes scoring goes into effect I'll have a nicer distribution of
scores (pushed towards the poles).

> >   * Can I stop SA from judging spamminess (that is, making the binary
> >     declaration of whether something is spam, X-Spam-Status,
> > X-Spam-Flag) and retain the scoring markup?  I suppose this may not
> > be important, as sa-learn is said to ignore prior SA markup, it's
> > just that having the declaration sitting in the headers from there on
> > makes these mails look spammy whether they truly are, and other more
> > naive tools might be misled.
>
> Some third-party Baysian filters let you you ignore unwanted headers.

I think this response might mean that I can't stop SA adding X-Spam-Status
and/or X-Spam-Flag, as the response proceeds without answering the question
directly.  I would like to have just the scoring without the judgements, but I
suppose again this is not an issue with regards to future application of
sa-learn.

The only other markup I feel it's actually necessary to hinder is the subject
markup.

> Even if you use one that doesn't, a single spam/ham token isn't likely
> to have all that much effect compared to all the other SA tokens. There

That's reassuring.

> are two main ways to use SA with a separate Bayesian filter. One is to
> score it into SA (which you can't do) and the other is to let the
> Bayesian filter pick-up extra tokens from the SA headers. In the latter
> case you would probably want to leave in the result at the default
> threshold anyway.

And I or another person shouldjust remember while looking at these emails that 
the judgement is not necessarily correct.  I guess I'm including myself (and
other humans) among the naive tools to worry about.

> I think you could get rid of it by creating a custom header, but it's
> probably not worth the effort.

"It" here referring to the final spamminess judgement?  Oh, sorry, I
misunderstood earlier, then.

> >   * If I can't stop SA from judging spamminess, can I at least
> > override the site-wide config to mark up subjects?  I can't figure
> > this out.  Currently I have 'rewrite_header  subject  ""', but that
> > fails.  The docs say the string should be set to 'a null value', but
> > the config file's syntax for specifying nulls is not described.
>
> I believe it just means:
>
> rewrite_header  subject

Ah, that's one of the permutations I tried.  Any idea why it may not have
worked?  I've been able to modify required_score, as is evidenced by mail
headers that come through, so I must be working in a picked-up config file.
(Again a `sa-conf` to view live/final config would be much better for me than
tweaking my user config file's required_score and then waiting for a spam to
arrive so I can know if a config specification went into effect.)  My only guess
now is that somehow site-wide config overrides user config for this item or that
user config for this item is disallowed.

Right now SA's config'd to prepend "***SPAM*** ".  But I don't see this string
or the string "rewrite_header" anywhere in the discovered config directory or my
user config.

So this is part of what I meant earlier about how my headers may be bad.  The
subjects.  I can't imagine SA would know how to ignore this prepended string --
could it?  The wiki article LearningMarkedUpMessages suggests that "Subject
header tagged etc." are automatically removed for learning, but the POD says
``Note that you should only use the _REQD_ and _SCORE_ tags when rewriting the
Subject header if "report_safe" is 0.''  I believe the site-wide config here is
set to 0 judging by the shape of the mails that arrive in my inbox (on another
system).

The other part of how my headers may be bad is that all the mail is getting
forwarded to another host where I read it.  Surely the additional header lines
are part of why there's a warning in the wiki not to "train Bayes on different
mail streams"?  But maybe it's possible to strip the distinguishing headers?

Ah, maybe the solution is to set report_safe back to the default of 1?  That
seems like it would solve all these problems.  Does that sound right?

Oh, hell, it looks like I am not allowed to set either rewrite_subject or
report_safe.  :(  Any advice?  (Other than complain to system admins?)

Thanks,

RSK

Re: config no subject rewrite, learning spam headers

Reply via email to