Ran these against my corpus. Here are the worst performers (lots in common with RW's complaints):
*SPAM% HAM% S/O NAME* 0.013 0.153 0.080 __RULEGEN_PHISH_BLR6YY 0.006 0.286 0.022 __RULEGEN_PHISH_0ATBRI 0.008 0.334 0.023 __RULEGEN_PHISH_L3I0Z5 0.002 0.300 0.006 __RULEGEN_PHISH_LGYG7Q 0.017 1.387 0.012 __RULEGEN_PHISH_QVS6GE 0.045 2.490 0.018 __RULEGEN_PHISH_UNQ4VP 0.027 2.011 0.013 __RULEGEN_PHISH_B9HL3A body __RULEGEN_PHISH_UNQ4VP / may contain information that is / body __RULEGEN_PHISH_QVS6GE / or entity to which it is addressed/ body __RULEGEN_PHISH_B9HL3A /The information contained in this / body __RULEGEN_PHISH_0ATBRI / it is addressed\. If you are n/ body __RULEGEN_PHISH_LGYG7Q / you have received it in error. / body __RULEGEN_PHISH_BLR6YY /uthorised and regulated by the / body __RULEGEN_PHISH_L3I0Z5 / is intended solely for the ..d/ A large number of the FPs come from Paypal and similar services. Even controlling for those, I haven't found the phishing ruleset useful at all. The fraud rules do have limited utility. What relationship does this have to the 10+ year-old SARE stuff? On 12/20/2014 03:35 AM, Axb wrote: > On 12/18/2014 06:27 PM, RW wrote: >> On Tue, 16 Dec 2014 13:10:05 +0100 >> Axb wrote: >> >>> https://sourceforge.net/projects/sare/files/ >>> >>> replaces any older version. >>> >>> leech while it lasts.... >>> >>> adjust scores if needed.. >> >> >> There are some rules that shouldn't be there. (I only tested a few that >> looked the most dubious) >> >> The first is a common phrase in mail from UK banks and other financial >> services companies. Note the "ise" spelling which is common outside >> the US. >> >> body __RULEGEN_PHISH_BLR6YY /uthorised and regulated by the / >> >> >> The following are common in legal disclaimer signatures: >> >> body __RULEGEN_PHISH_UNQ4VP / may contain information that is / >> body __RULEGEN_PHISH_B9HL3A /The information contained in this / >> body __RULEGEN_PHISH_C6URDE / do not necessarily represent those of / >> body __RULEGEN_PHISH_L3I0Z5 / is intended solely for the ..d/ >> >> >> This hits some of of my ham: >> >> body __RULEGEN_PHISH_SRX3XZ / apologize for any inconvenience/ >> >> >> Unless there's a bug, the fact that those disclaimer phrases got through >> suggests that these rules are either intended to be very much more >> aggressive than the SOUGHT rules, or the ham corpus isn't good enough. > > > as the rules were generated with donated corpus data, you're more than > welcome to send me an archive of ham samples to avoid these potential > issues. > > > > >
signature.asc
Description: OpenPGP digital signature