On Tuesday 31 May 2016 at 15:32:49, Reindl Harald wrote: > Am 31.05.2016 um 15:28 schrieb Antony Stone: > > 2. You should be aware (*especially* if using this stuff as the basis of > > a research project - any competent referee should pick up on something > > like this) that SA works best when the emails it is asked to process are > > from the same source as it has been trained with. In other words, you > > shovel real emails through a real mail server and train SA using this > > spam and ham; you then use that trains SA to assess mail passing through > > that same mail server, for the same users. Anything significantly > > varying from this is not going to work well, and is certainly not a good > > test of how well SA works. > > not true - i heard similar nonsense about "you can't re-use you MX bayes > database on a submission server" - i can, do and it works like a charm
Oh! I had read SA documentation such as https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html which contains comments such as: "The pros of Bayesian spam analysis: Can greatly reduce false positives and false negatives. - It learns from your mail, so it is tailored to your unique e-mail flow." "You're urged to avoid using a publicly available corpus (sample) - this must be taken from YOUR mail server, if it is to be statistically useful. Otherwise, the results may be pretty skewed." If this sort of advice is incorrect, maybe a request should be raised with the SA developers to update the official documentation? Antony. -- If the human brain were so simple that we could understand it, we'd be so simple that we couldn't. Please reply to the list; please *don't* CC me.