Re: Re[2]: possible HTML rules to delete

2004-04-16 Thread Daniel Quinlan
Robert Menschel <[EMAIL PROTECTED]> writes: > Corpus run with 2.63 distribution rules plus the above rule. > > S/O 0.918 (compared to global 0.813), over 6% of spam, significant ham. Thanks. Can you run hit-frequencies as follows? $ ./hit-frequences -xpa -M 'HTML_MESSAGE|__MIME_HTML' -m 'LW_

Re[2]: possible HTML rules to delete

2004-04-16 Thread Robert Menschel
Hello Daniel, Tuesday, April 13, 2004, 6:46:10 PM, you wrote: DQ> Loren Wilton <[EMAIL PROTECTED]> writes: >> meta LW_BIG_AND_RED (HTML_FONT_BIG && HTML_FONTCOLOR_RED) >> describe LW_BIG_AND_RED BIG RED TEXT >> score LW_BIG_AND_RED 3 DQ> Someone with a corpus could certainly give it a s

Re: possible HTML rules to delete

2004-04-14 Thread Daniel Quinlan
Loren Wilton <[EMAIL PROTECTED]> writes: > meta LW_BIG_AND_RED (HTML_FONT_BIG && HTML_FONTCOLOR_RED) > describe LW_BIG_AND_RED BIG RED TEXT > score LW_BIG_AND_RED 3 Someone with a corpus could certainly give it a shot. It's speculative without a corpus run, though. >>> The COLOR_UNSAFE

Re: possible HTML rules to delete

2004-04-14 Thread Loren Wilton
> > The RED and BLUE tags seem moderately useful in conjunction with big > > font checks. > > Perhaps, but we don't have a rule for that. Yes, but I do. If the font color check goes away, then I won't, which will be a net loss for my spam checking abilities. meta LW_BIG_AND_RED (HTML_FONT_BIG

Re: possible HTML rules to delete

2004-04-13 Thread Daniel Quinlan
"Loren Wilton" <[EMAIL PROTECTED]> writes: > The RED and BLUE tags seem moderately useful in conjunction with big > font checks. Perhaps, but we don't have a rule for that. > The COLOR_UNSAFE rule would be very useful if it worked better. It > seems to catch a lot of the fe and fefefe type

Re: possible HTML rules to delete

2004-04-13 Thread Loren Wilton
The RED and BLUE tags seem moderately useful in conjunction with big font checks. I don't know why blue text is so popular, all I can guess is it is the color of certain little pills. The other color cases could probably disappear with no great loss. The COLOR_UNSAFE rule would be very useful if

Re: possible HTML rules to delete

2004-04-13 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Quinlan writes: > These results are for the HTML_MESSAGE messages in our corpus. > > OVERALL% SPAM% HAM% S/ORANK SCORE NAME > 186686 182745 39410.979 0.000.00 (all messages) > 100.000 97.8890 2.11100.9

possible HTML rules to delete

2004-04-13 Thread Daniel Quinlan
These results are for the HTML_MESSAGE messages in our corpus. OVERALL% SPAM% HAM% S/ORANK SCORE NAME 186686 182745 39410.979 0.000.00 (all messages) 100.000 97.8890 2.11100.979 0.000.00 (all messages as %) Anything with an S/O below 0.500 is hitti