-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kevin Golding kirjoitti 21.1.2017 21:22:
> On Sat, 21 Jan 2017 19:08:39 -0000, Jari Fredriksson <ja...@iki.fi> wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> John Hardin kirjoitti 20.1.2017 22:38:
>> 
>>> Collecting spam after RBL filtering is much less helpful to masscheck.
>>> Ideally your spam corpus is from a totally unfiltered feed.
>>> 
>>> However, even if it is filtered and small, it helps, *especially* if
>>> the ham is not in English - masscheck is perennially starved for
>>> non-English ham and rule scoring is thus baised against non-English
>>> languages to a degree.
>> 
>> This is NOT what I have learned from SA lists. I used to do this, but
>> learned in SA discussions that it is *harmful* to pass such spam to
>> masscheck. That it harms the SA users doing proper pre SA filtering.
>> 
>> We do *need* an official policy! What are we going to do with mixed
>> messages like this??
> 
> It was written down once. I saw the unfiltered thing again when I
> looked  earlier today, but I can't spot it just now. I believe I was
> also told by  someone who knows this stuff that it wasn't a
> requirement, more an ideal.
> 
> However looking for that comment again just now I registered another
> discrepancy on the wiki:
> 
> https://wiki.apache.org/spamassassin/CorpusCleaning - no spam older
> than 2  months
> 
> https://wiki.apache.org/spamassassin/HandClassifiedCorpora - no spam
> older  than 6 months
> 
> I don't think either are actually strict rules. It will help lower the
>  barrier to entry if we can make this stuff more uniform. It could
> also be  argued that having two such similar pages is somewhat
> redundant actually.

What has CorpusCleaning from garbage to with this? Really confused now.

- -- 
ja...@iki.fi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAliDwMEACgkQKL4IzOyjSrZVegCeP+YQcK6s4AlHb4iTqbzUtige
ZTAAoKFGolEuLmElzqZu1KT3+RmMm/s2
=mDIS
-----END PGP SIGNATURE-----

Reply via email to