----- Original Message ----- From: "Robert Menschel" <[EMAIL PROTECTED]> To: "Sandy S" <[EMAIL PROTECTED]> Cc: "Thomas Kinghorn" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, August 03, 2004 2:42 PM Subject: Re[2]: more are more junk getting through
> Hello Sandy, > > Tuesday, August 3, 2004, 6:43:13 AM, you wrote: > > SS> Tom - > SS> I've seen these too, and have put in the following rules to try and catch > SS> them. They've helped, but a few are still slipping through. I think if you > SS> raise the scores some they'd catch a lot more, but I need to be a little > SS> more sure I won't get FPs before I do this! > > Sandy, results of my mass-check here: > > Section 3 -- Frequencies Log > (First numeric frequencies, followed by percentage frequencies) > > OVERALL% SPAM% HAM% S/O SCORE NAME > 58315 33581 24734 0.576 0.00 0.00 (all messages) > 668 462 206 0.623 1.00 0.60 ODD_CHAR_COMMA_BA > 287 249 38 0.828 0.89 0.60 ODD_CHAR_CARET_BA > 557 261 296 0.394 0.78 0.60 ODD_CHAR_DOT_BA > 75 44 31 0.511 0.44 0.60 ODD_CHAR_TIC1_BA > 887 197 690 0.174 0.33 0.60 ODD_CHAR_UNDERSCORE_BA > 6363 240 6123 0.028 0.11 0.60 ODD_CHAR_TILDE_BA > 5988 194 5794 0.024 0.00 0.60 ODD_CHAR_DASH_BA > 1151 183 968 0.122 0.00 0.60 ODD_CHAR_TIC2_BA > > OVERALL% SPAM% HAM% S/O RANK SCORE NAME > 58315 33581 24734 0.576 0.00 0.00 (all messages) > 100.000 57.5855 42.4145 0.576 0.00 0.00 (all messages as %) > 1.146 1.3758 0.8329 0.623 1.00 0.60 ODD_CHAR_COMMA_BA > 0.492 0.7415 0.1536 0.828 0.89 0.60 ODD_CHAR_CARET_BA > 0.955 0.7772 1.1967 0.394 0.78 0.60 ODD_CHAR_DOT_BA > 0.129 0.1310 0.1253 0.511 0.44 0.60 ODD_CHAR_TIC1_BA > 1.521 0.5866 2.7897 0.174 0.33 0.60 ODD_CHAR_UNDERSCORE_BA > 10.911 0.7147 24.7554 0.028 0.11 0.60 ODD_CHAR_TILDE_BA > 10.268 0.5777 23.4252 0.024 0.00 0.60 ODD_CHAR_DASH_BA > 1.974 0.5450 3.9136 0.122 0.00 0.60 ODD_CHAR_TIC2_BA > > All these rules hit significant ham. Indeed, UNDERSCORE_BA, DOT_BA, > TILDE_BA, DASH_BA, and TIC2_BA hit more ham than spam in my corpus. > > Bob Menschel > Ouch! Thanks for running the check. Obviously I will need to keep these scored very low if if I continue to use them. Sandy
