On Tuesday 31 May 2016 at 15:32:49, Reindl Harald wrote:

> Am 31.05.2016 um 15:28 schrieb Antony Stone:
> > 2. You should be aware (*especially* if using this stuff as the basis of
> > a research project - any competent referee should pick up on something
> > like this) that SA works best when the emails it is asked to process are
> > from the same source as it has been trained with.  In other words, you
> > shovel real emails through a real mail server and train SA using this
> > spam and ham; you then use that trains SA to assess mail passing through
> > that same mail server, for the same users.  Anything significantly
> > varying from this is not going to work well, and is certainly not a good
> > test of how well SA works.
> 
> not true - i heard similar nonsense about "you can't re-use you MX bayes
> database on a submission server" - i can, do and it works like a charm

Oh!

I had read SA documentation such as
https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html which contains 
comments such as:

"The pros of Bayesian spam analysis:
Can greatly reduce false positives and false negatives.
 - It learns from your mail, so it is tailored to your unique e-mail flow."

"You're urged to avoid using a publicly available corpus (sample) - this must 
be taken from YOUR mail server, if it is to be statistically useful. 
Otherwise, the results may be pretty skewed."


If this sort of advice is incorrect, maybe a request should be raised with the 
SA developers to update the official documentation?


Antony.

-- 
If the human brain were so simple that we could understand it,
we'd be so simple that we couldn't.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Reply via email to