Hello Alton,

Friday, March 19, 2004, 2:52:22 PM, you wrote:

AD> Would someone give these rules a run and see how much trouble they cause?

My corpus results attached.

CTS_CONFIDENTIAL has problems, as does CTS_PROPOSITION. CTS_ANONYMOUS is
questionable, but might be useful -- sample of hits is small.

CTS_APOLOGY hits only ham here. CTS_VIRUSWARN2 hit only ham, someone's
response to a person fooled by the teddy bear virus hoax. (The original
hoaxee's email is not in my corpus, but the response is.)

AD> There might be some obvious overlap with existing, better, rulesets. If you
AD> feel like pointing them out it would be great. One last thing, on the
AD> CTS_CONFIDENTIAL rule I'm sure there is a way to further consolidate the
AD> confidential|confidentially|confidentiality, but I'm not exactly sure how. 
If
AD> you feel like giving a tip there that will help too.

My confidential rules (note: my scoring is based on a 9.0 required hits):

header    RM_sp_Confidential      Subject =~ /Confidential (?:info|med|assist)/i
describe  RM_sp_Confidential      Subject mentions Confidential info
score     RM_sp_Confidential      0.900  # 9s/0h of 100793 corpus 
(82099s/18694h) 02/21/04
                                         # max: 11s/0h of 97268 corpus 
(79437s/17831h) 01/24/04
                                         # also matches my RM_sw_Confidential

header    RM_sw_Confidential     Subject =~ /confidential/i
describe  RM_sw_Confidential     Subject mentions Confidential info
score     RM_sw_Confidential     0.809  # 89s/10h of 97268 corpus 
(79437s/17831h) 01/24/04
header    RM_sw_Confidentialo1   Subject =~ 
/(?!confidential)c.?o.?n.?f.?i.?d.?e.?n.?t.?i.?a.?l/i
describe  RM_sw_Confidentialo1   Subject mentions Confidential info
score     RM_sw_Confidentialo1   4.500  # type=obfu - 3s/0h of 91714 corpus 
(74113s/17601h) 01/23/04
header    RM_sw_Confidentialo2   Subject =~ 
/(?!confidential)(?:[c\*\xC7\xE7\xA2\xA9]|\xC4[\x86-\x8D]|\xD0\xA1|\xD1\x81)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[o0\*\xB0\xBA\xD8\xF8\xD2-\xD6\xF2-\xF6]|\(\)|\[\]|\xC5[\x8C-\x91]|\xC6[\xA0-\xA1]|\xC7[\x91-\x92]|\xC7[\xBE-\xBF]|\xCE\x8C|\xCE\x98|\xCE\x9F|\xCE\xB8|\xCE\xBF|\xCF\x8C|\xD0\x9E|\xD0\xBE|\xD5\x95)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[n\xD1\xF1]|\|\\\||\xC5[\x83-\x8B]|\xCE\x9D|\xCE\xA0|\xCE\xAE|\xCE\xB7|\xD5\xB2|\xD5\xB8)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:f|\xC5\xBF|\xC6\x92|\xD2[\x92-\x93]])[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[il1:\|\*\xCC-\xCF\xEC-\xEF\xA6]|\xC4[\xA8-\xB0]|\xC4\xBA|\xC4\xBC|\xC4\xBE|\xC5\x80|\xC5\x82|\xC7[\x8F-\x90]|\xD0[\x86-\x87]|\xD1[\x96-\x97]|\xCE\x8A|\xCE\x90|\xCE\x99|\xCE\xAA|\xCE\xAF|\xCE\xB9|\xCF\x8A)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[d\xD0]|\xC4[\x8E-\x91])[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[e3\*\xC8-\xCB\xE8-\xEB]|\xC4[\x92-\x9B]|\xCE\x88|\xCE\x95|\xCE\xA3|\xCE\xAD|\xCE\xB5|\xD0\x81|\xD0\x95|\xD0\xB5|\xD1\x91)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[n\xD1\xF1]|\|\\\||\xC5[\x83-\x8B]|\xCE\x9D|\xCE\xA0|\xCE\xAE|\xCE\xB7|\xD5\xB2|\xD5\xB8)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[t\+]|\xC5[\xA2-\xA7]|\xCE\xA4|\xCF\x84|\xD0\xA2|\xD1\x82)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[il1:\|\*\xCC-\xCF\xEC-\xEF\xA6]|\xC4[\xA8-\xB0]|\xC4\xBA|\xC4\xBC|\xC4\xBE|\xC5\x80|\xC5\x82|\xC7[\x8F-\x90]|\xD0[\x86-\x87]|\xD1[\x96-\x97]|\xCE\x8A|\xCE\x90|\xCE\x99|\xCE\xAA|\xCE\xAF|\xCE\xB9|\xCF\x8A)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[EMAIL
 
PROTECTED]|\/\\|\xC4[\x80-\x85]|\xC7[\x8D-\x8E]|\xC7[\xBA-\xBB]|\xCE\x86|\xCE\x91|\xCE\x94|\xCE\x9B|\xCE\xAC|\xCE\xB1|\xD0\x90|\xD0\xB0)[\x01-\x2F\\\^_`\|\x7F-\xA1\xA4-\xA8\xAB-\xAD\xAF-\xB1\xB4\xB7-\xBB\xBF\xF7]?(?:[l1I\|\xA3]|(?:\xC5[\x80-\x82]|\xC4[\xB9-\xBF]))/i
describe  RM_sw_Confidentialo2   Subject mentions Confidential info
score     RM_sw_Confidentialo2   4.500  # type=obfu - 2s/0h of 91714 corpus 
(74113s/17601h) 01/23/04
                                         
body      RM_bpn_Confidential    
/(?:total(?:ly)?|VERY|strictly|high(?:est|ly)?|utmost) 
CONFIDEN(?:ce|T(?:AI|IA)L)/i
describe  RM_bpn_Confidential    says this is very confidential
score     RM_bpn_Confidential    1.584  # 409s/6h of 97268 corpus 
(79437s/17831h) 01/24/04
                                        # ham: membership list, survey 
confidentiality, 
body      RM_bpn_Confidential2   /\bconfidential(?:ity)? assured/i
describe  RM_bpn_Confidential2   says this is very confidential
score     RM_bpn_Confidential2   3.000  # 616s/0h of 106556 corpus 
(87320s/19236h) 02/27/04

Bob Menschel

Reply via email to