Robert Menschel <[EMAIL PROTECTED]> writes:
> Corpus run with 2.63 distribution rules plus the above rule.
>
> S/O 0.918 (compared to global 0.813), over 6% of spam, significant ham.
Thanks. Can you run hit-frequencies as follows?
$ ./hit-frequences -xpa -M 'HTML_MESSAGE|__MIME_HTML' -m
'LW_
Hello Daniel,
Tuesday, April 13, 2004, 6:46:10 PM, you wrote:
DQ> Loren Wilton <[EMAIL PROTECTED]> writes:
>> meta LW_BIG_AND_RED (HTML_FONT_BIG && HTML_FONTCOLOR_RED)
>> describe LW_BIG_AND_RED BIG RED TEXT
>> score LW_BIG_AND_RED 3
DQ> Someone with a corpus could certainly give it a s
Loren Wilton <[EMAIL PROTECTED]> writes:
> meta LW_BIG_AND_RED (HTML_FONT_BIG && HTML_FONTCOLOR_RED)
> describe LW_BIG_AND_RED BIG RED TEXT
> score LW_BIG_AND_RED 3
Someone with a corpus could certainly give it a shot. It's speculative
without a corpus run, though.
>>> The COLOR_UNSAFE
> > The RED and BLUE tags seem moderately useful in conjunction with big
> > font checks.
>
> Perhaps, but we don't have a rule for that.
Yes, but I do. If the font color check goes away, then I won't, which will
be a net loss for my spam checking abilities.
meta LW_BIG_AND_RED (HTML_FONT_BIG
"Loren Wilton" <[EMAIL PROTECTED]> writes:
> The RED and BLUE tags seem moderately useful in conjunction with big
> font checks.
Perhaps, but we don't have a rule for that.
> The COLOR_UNSAFE rule would be very useful if it worked better. It
> seems to catch a lot of the fe and fefefe type
The RED and BLUE tags seem moderately useful in conjunction with big font
checks.
I don't know why blue text is so popular, all I can guess is it is the color
of certain little pills. The other color cases could probably disappear
with no great loss.
The COLOR_UNSAFE rule would be very useful if
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Daniel Quinlan writes:
> These results are for the HTML_MESSAGE messages in our corpus.
>
> OVERALL% SPAM% HAM% S/ORANK SCORE NAME
> 186686 182745 39410.979 0.000.00 (all messages)
> 100.000 97.8890 2.11100.9
These results are for the HTML_MESSAGE messages in our corpus.
OVERALL% SPAM% HAM% S/ORANK SCORE NAME
186686 182745 39410.979 0.000.00 (all messages)
100.000 97.8890 2.11100.979 0.000.00 (all messages as %)
Anything with an S/O below 0.500 is hitti