On Sun, 28 Feb 2010, LuKreme wrote: > SPF! > > <runs; ducking, shucking, and weaving>
You're a brave person. ;) It's easier to understand the challenge Dave faces, if we look at some actual From headers. In my stream, these started in early November of last year, so I just checked a few months of data from one domain which has had a steady trickle, AND has the richest ham diversity of all the domains I filter (translation: highest FP rate, and highest number of custom pass/skip rules (fortunately, also one of my MOST keen and helpful domain admins)). Since these started, they've had 19 of these phish: 1 "Bank of America"<supp...@boa.com> 1 "PayPaI"<upd...@paypai.com> 1 "Paypal Inc."<cust_s...@paypalsecurity.com> 1 "serv...@irs.gov"<serv...@irs.gov> 1 "serv...@paypal.com"<c> 1 "serv...@paypal.com"<secur...@act.embarqservices.net> 3 "serv...@paypal.com"<Security> 1 "U.S. Bancorp"<off...@usb.com> 1 "Wachovia"<supp...@wachovia.com> 1 "Wells Fargo Online"<ofsreponline.al...@wellsfargo.com> 1 Bank of America <memberserv...@bofa.com> 2 Bank of America <serv...@boa.com> 1 Bank of America<memberserv...@boa.com> 1 Internal Revenue Service<service.refun...@irss.com> 1 Western Union<memberserv...@poste.it> 1 Western Union<memberserv...@wumts.com> (first column is frequency) This was from a sample size of: 106171 spams 43692 hams Note the variations on Paypal, none of which would trigger an SPF issue (some did have matching SMTP Senders). Note the clever use of RealNames to mask the actual From domain. By spam standards, these are VERY well crafted. Note that ALL hit my phish tests, as outlined last week. :) In that same sample, I found only 3 hams with base64 application/octet-stream html attachments. Given their ham diversity, that was most promising. The hams were: jcpenney.com (they're already part of our manually maintained "bulk" nations, with an implicit set of skip conditions) a local church (one html attachment was in amongst a ton of other stuff (mostly Word docs), all domains were already skip listed, and the sender already had a modest pass rule) "Britannica Elementary Encyclopedia article" (had _LOTS_ of other issues (including INVALID_DATE), and FP'd quite spectacularly!) When these phish first appeared, I did a similar ham check (further back, more domains), and found no major issues, so I ended up adding a base64 html attachment content rule. Dave, I do have one university professor research domain (and it was one of the corpora I ham checked), however it's in the social sciences, so it's probably a significantly different ham ecology from what you're seeing. I have a strong impression that you're a :) data analyzing kind of guy, and probably have decent logs. Do you see many ham base64 html attachments? That's more my curiosity, than anything. Those just feel like the sort of thing it's legitimate to penalize, though of course it depends on your FP pipeline tools, and user community. I've supported PhDs, and find quilting grannies far easier. ;) In my own post-SA filter, I've been extracting URLs from these, for years. In most cases the domains were VERY useful and did trigger on some blocklists. It's definitely the more technically correct approach. I still use the kludge content rule, mainly for belts-and-suspenders, since these ARE well crafted. Given the low rate of occurrence in ham, I didn't anticipate any significant extraction performance issues, though I do have a size constraint in place. If that's the concern, these have all been small-ish. John mentioned the reasoning for SA not extracting was: "if the MUA doesn't display it automatically, why should we scan it?" Which makes perfect sense as a general principle, however, in the case of these phish, social engineering is the vector for their display. Apologies if I'm missing blatant Perl or SA architecture issues, about which, I am only an egg. - "Chip"