Sorry -- didn't get that attachment attached... RM> Hello Alton,
RM> Friday, March 19, 2004, 2:52:22 PM, you wrote:
AD>> Would someone give these rules a run and see how much trouble they cause?
RM> My corpus results attached.
RM> CTS_CONFIDENTIAL has problems, as does CTS_PROPOSITION. CTS_ANONYMOUS is
RM> questionable, but might be useful -- sample of hits is small.
RM> CTS_APOLOGY hits only ham here. CTS_VIRUSWARN2 hit only ham, someone's
RM> response to a person fooled by the teddy bear virus hoax. (The original
RM> hoaxee's email is not in my corpus, but the response is.)
AD>> There might be some obvious overlap with existing, better, rulesets. If you
AD>> feel like pointing them out it would be great. One last thing, on the
AD>> CTS_CONFIDENTIAL rule I'm sure there is a way to further consolidate the
AD>> confidential|confidentially|confidentiality, but I'm not exactly sure how.
If
AD>> you feel like giving a tip there that will help too.
RM> My confidential rules (note: my scoring is based on a 9.0 required hits):
RM> header RM_sp_Confidential Subject =~ /Confidential
(?:info|med|assist)/i
RM> describe RM_sp_Confidential Subject mentions Confidential info
RM> score RM_sp_Confidential 0.900 # 9s/0h of 100793 corpus
(82099s/18694h) 02/21/04
RM> # max: 11s/0h of 97268 corpus
(79437s/17831h) 01/24/04
RM> # also matches my
RM_sw_Confidential
RM> header RM_sw_Confidential Subject =~ /confidential/i
RM> describe RM_sw_Confidential Subject mentions Confidential info
RM> score RM_sw_Confidential 0.809 # 89s/10h of 97268 corpus
(79437s/17831h) 01/24/04
RM> header RM_sw_Confidentialo1 Subject =~
RM> /(?!confidential)c.?o.?n.?f.?i.?d.?e.?n.?t.?i.?a.?l/i
RM> describe RM_sw_Confidentialo1 Subject mentions Confidential info
RM> score RM_sw_Confidentialo1 4.500 # type=obfu - 3s/0h of
RM> 91714 corpus (74113s/17601h) 01/23/04
RM> header RM_sw_Confidentialo2 Subject =~
RM>
/(?!confidential)(?:[c\*\xC7\xE7\xA2\xA9]|\xC4[\x86-\x8D]|\xD0\xA1|\xD1\x81)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[o0\*\xB0\xBA\xD8\xF8\xD2-\xD6\xF2-\xF6]|\(\)|\[\]|\xC5[\x8C-\x91]|\xC6[\xA0-\xA1]|\xC7[\x91-\x92]|\xC7[\xBE-\xBF]|\xCE\x8C|\xCE\x98|\xCE\x9F|\xCE\xB8|\xCE\xBF|\xCF\x8C|\xD0\x9E|\xD0\xBE|\xD5\x95)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[n\xD1\xF1]|\|\\\||\xC5[\x83-\x8B]|\xCE\x9D|\xCE\xA0|\xCE\xAE|\xCE\xB7|\xD5\xB2|\xD5\xB8)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:f|\xC5\xBF|\xC6\x92|\xD2[\x92-\x93]])[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[il1:\|\*\xCC-\xCF\xEC-\xEF\xA6]|\xC4[\xA8-\xB0]|\xC4\xBA|\xC4\xBC|\xC4\xBE|\xC5\x80|\xC5\x82|\xC7[\x8F-\x90]|\xD0[\x86-\x87]|\xD1[\x96-\x97]|\xCE\x8A|\xCE\x90|\xCE\x99|\xCE\xAA|\xCE\xAF|\xCE\xB9|\xCF\x8A)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[d\xD0]|\xC4[\x8E-\x91])[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[e3\*\xC8-\xCB\xE8-\xEB]|\xC4[\x92-\x9B]|\xCE\x88|\xCE\x95|\xCE\xA3|\xCE\xAD|\xCE\xB5|\xD0\x81|\xD0\x95|\xD0\xB5|\xD1\x91)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[n\xD1\xF1]|\|\\\||\xC5[\x83-\x8B]|\xCE\x9D|\xCE\xA0|\xCE\xAE|\xCE\xB7|\xD5\xB2|\xD5\xB8)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[t\+]|\xC5[\xA2-\xA7]|\xCE\xA4|\xCF\x84|\xD0\xA2|\xD1\x82)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[il1:\|\*\xCC-\xCF\xEC-\xEF\xA6]|\xC4[\xA8-\xB0]|\xC4\xBA|\xC4\xBC|\xC4\xBE|\xC5\x80|\xC5\x82|\xC7[\x8F-\x90]|\xD0[\x86-\x87]|\xD1[\x96-\x97]|\xCE\x8A|\xCE\x90|\xCE\x99|\xCE\xAA|\xCE\xAF|\xCE\xB9|\xCF\x8A)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[EMAIL
PROTECTED]|\/\\|\xC4[\x80-\x85]|\xC7[\x8D-\x8E]|\xC7[\xBA-\xBB]|\xCE\x86|\xCE\x91|\xCE\x94|\xCE\x9B|\xCE\xAC|\xCE\xB1|\xD0\x90|\xD0\xB0)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[l1I\|\xA3]|(?:\xC5[\x80-\x82]|\xC4[\xB9-\xBF]))/i
RM> describe RM_sw_Confidentialo2 Subject mentions Confidential info
RM> score RM_sw_Confidentialo2 4.500 # type=obfu - 2s/0h of
RM> 91714 corpus (74113s/17601h) 01/23/04
RM> body RM_bpn_Confidential
RM> /(?:total(?:ly)?|VERY|strictly|high(?:est|ly)?|utmost)
RM> CONFIDEN(?:ce|T(?:AI|IA)L)/i
RM> describe RM_bpn_Confidential says this is very confidential
RM> score RM_bpn_Confidential 1.584 # 409s/6h of 97268 corpus
(79437s/17831h) 01/24/04
RM> # ham: membership list, survey
confidentiality,
RM> body RM_bpn_Confidential2 /\bconfidential(?:ity)? assured/i
RM> describe RM_bpn_Confidential2 says this is very confidential
RM> score RM_bpn_Confidential2 3.000 # 616s/0h of 106556 corpus
(87320s/19236h) 02/27/04
RM> Bob Menschel
a61.ad0320.cf.out
Description: Binary data
