Joseph Brennan writes: > > > --On Sunday, September 21, 2008 18:39 -0600 Bob Proulx <[EMAIL PROTECTED]> > wrote: > > >> OVERALL SPAM% HAM% S/O RANK SCORE NAME > >> 1.116 1.5957 0.2705 0.855 0.51 2.08 SUBJ_ALL_CAPS > > > > Am I reading that correctly to see that in spam all caps showed up in > > 1.60% of the regression corpus and only in 0.27% of the non-spam? > > Gosh that seems like a very small indicator. > > > No, it's high. Only 1.87% had all caps subject, but of those 85% > were spam: 1.60 / 1.87. > > If I am reading correctly.
That's right. The problem with SUBJ_ALL_CAPS is that it tends to catch really odd fraud spams, foreign-language spam etc. that the other rules fail to spot; this means that the GA likes it quite a lot, since despite the occasional FP, it reduces FNs enough to make it "worth it". it's hard to avoid this issue. :( --j.