Ross Vandegrift said:

> Obviously an interim solution is to whitelist, but long term is probably
> harder.  What kind of SCE, like MSDN newsletters, product updates, etc
> is in the corpus for the GA?  Maybe seeding the corpus with a bigger set
> of these type of mails will have some interesting/useful results.

Yes, this is my position.  That way, patterns which overmatch SCE (and
thereby cause FPs) will be penalized heavily.

As a result I'm now collecting a corpus of nonspam SCEs, using a few
common FP'ing SCE sources.  BTW if anyone knows of more FP'ing SCE
sources, I'd appreciate if they posted them; the more the better.

I'm currently getting C|Net, ZDNet, LockerGnome, Cramsession and The
Guardian (uk newspaper).  MSDN I tried, but Passport.com doesn't like
Mozilla it seems ;)

--j.

-- 
'Justin Mason' => { url => 'http://jmason.org/', blog => 'http://taint.org/' }


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to