-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Kevin Golding kirjoitti 21.1.2017 21:22: > On Sat, 21 Jan 2017 19:08:39 -0000, Jari Fredriksson <ja...@iki.fi> wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> John Hardin kirjoitti 20.1.2017 22:38: >> >>> Collecting spam after RBL filtering is much less helpful to masscheck. >>> Ideally your spam corpus is from a totally unfiltered feed. >>> >>> However, even if it is filtered and small, it helps, *especially* if >>> the ham is not in English - masscheck is perennially starved for >>> non-English ham and rule scoring is thus baised against non-English >>> languages to a degree. >> >> This is NOT what I have learned from SA lists. I used to do this, but >> learned in SA discussions that it is *harmful* to pass such spam to >> masscheck. That it harms the SA users doing proper pre SA filtering. >> >> We do *need* an official policy! What are we going to do with mixed >> messages like this?? > > It was written down once. I saw the unfiltered thing again when I > looked earlier today, but I can't spot it just now. I believe I was > also told by someone who knows this stuff that it wasn't a > requirement, more an ideal. > > However looking for that comment again just now I registered another > discrepancy on the wiki: > > https://wiki.apache.org/spamassassin/CorpusCleaning - no spam older > than 2 months > > https://wiki.apache.org/spamassassin/HandClassifiedCorpora - no spam > older than 6 months > > I don't think either are actually strict rules. It will help lower the > barrier to entry if we can make this stuff more uniform. It could > also be argued that having two such similar pages is somewhat > redundant actually.
What has CorpusCleaning from garbage to with this? Really confused now. - -- ja...@iki.fi -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAliDwMEACgkQKL4IzOyjSrZVegCeP+YQcK6s4AlHb4iTqbzUtige ZTAAoKFGolEuLmElzqZu1KT3+RmMm/s2 =mDIS -----END PGP SIGNATURE-----