http://bugzilla.spamassassin.org/show_bug.cgi?id=3154
------- Additional Comments From [EMAIL PROTECTED] 2004-03-11 01:16 ------- Hmmm... those generally sound like good changes, but the patch as a whole seems to have a somewhat negative effect on rule results. Testing 5496 spam and 5497 ham, my spam hits changed as follows: It seems like the additional whitespace is breaking the tests that rely on looking for text directly before or after tags, especially the BACKHAIR* and HTML_OBFUSCATE_* rules. We might want to rethink *how* and *when* formatting text (like the whitespace and such) is added, perhaps develop a better understanding of how we expect text to be surrounded by tags, or both. ------- start of cut text -------------- 21 HTML_40_50 21 LINES_OF_YELLING_2 12 HTML_10_20 12 HTML_70_80 2 HTML_IMAGE_ONLY_02 2 HTML_IMAGE_ONLY_08 2 LINES_OF_YELLING 2 __HTML_COMMENT_RATIO 1 DEAR_FRIEND 1 HTML_00_10 1 HTML_IMAGE_ONLY_04 1 HTML_IMAGE_RATIO_02 1 HTML_IMAGE_RATIO_10 1 HTML_IMAGE_RATIO_14 1 LINES_OF_YELLING_3 -1 HTML_50_60 -1 HTML_IMAGE_RATIO_06 -1 HTML_IMAGE_RATIO_12 -1 T_BACKHAIR2_2_2 -1 T_BACKHAIR2_2_3 -1 T_BACKHAIR2_2_4 -1 T_BACKHAIR2_4_7 -1 T_BACKHAIR2_5_2 -1 T_BACKHAIR2_7_6 -1 T_BACKHAIR_1_4 -1 T_BACKHAIR_1_5 -1 T_BACKHAIR_4_7 -1 T_BACKHAIR_6_5 -1 T_BACKHAIR_7_6 -1 T_DOMAIN_RATIO_001 -1 T_DOMAIN_RATIO_002 -1 T_DOMAIN_RATIO_004 -1 T_DOMAIN_RATIO_010 -1 T_DOMAIN_RATIO_012 -1 T_DOMAIN_RATIO_016 -1 T_DOMAIN_RATIO_018 -1 T_DOMAIN_RATIO_025 -1 T_DOMAIN_RATIO_029 -1 T_DOMAIN_RATIO_032 -1 T_DOMAIN_RATIO_033 -1 T_DOMAIN_RATIO_034 -1 T_DOMAIN_RATIO_046 -1 T_DOMAIN_RATIO_047 -1 T_DOMAIN_RATIO_048 -1 T_DOMAIN_RATIO_049 -1 T_RM_BPT_LONGWORDS_7_8_A -1 UNIQUE_WORDS -2 HTML_30_40 -2 HTML_IMAGE_ONLY_06 -2 T_BACKHAIR2_4_6 -2 T_BACKHAIR2_6_2 -2 T_BACKHAIR2_6_7 -2 T_BACKHAIR_4_4 -2 T_BACKHAIR_4_6 -2 T_BACKHAIR_6_2 -2 T_BACKHAIR_6_7 -2 T_DOMAIN_RATIO_038 -3 T_BACKHAIR2_4_2 -3 T_BACKHAIR2_4_4 -3 T_BACKHAIR2_6_1 -3 T_BACKHAIR2_6_5 -3 T_BACKHAIR_4_2 -3 T_BACKHAIR_6_1 -4 T_DOMAIN_RATIO_003 -5 T_BACKHAIR_4_5 -6 HTML_20_30 -6 T_BACKHAIR2_3_7 -9 HTML_60_70 -11 HTML_80_90 -14 FREE_PORN -14 T_BACKHAIR2_4_5 -16 T_BACKHAIR2_1_6 -17 HTML_90_100 -32 HTML_OBFUSCATE_00_10 ------- end ---------------------------- and ham changed as follows: ------- start of cut text -------------- 10 HTML_OBFUSCATE_00_10 4 HTML_30_40 3 HTML_10_20 3 HTML_70_80 -1 HTML_40_50 -2 HTML_60_70 -3 HTML_20_30 -4 HTML_80_90 ------- end ---------------------------- ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
