On Wed, 2003-08-06 at 05:39, Robert Menschel wrote: > CS> For those who may not see it, these are not the sender of the spam > CS> domains, but the domain of the image hosts, often owned by spammers. > CS> Therefore it is ever changing like a RBL. So submissions of these to > CS> the Rule Emporium would be tooo lengthy. You would almost have to > CS> have an RBL for the rule :) > > We could, however, set up a blacklist through a website, such that > anyone can submit an entry, a simple domain name such as time4more.net, > or an IP address if that's the reference in the spam, or a more specific > URI (spaml3.time4more.net/spamdir or 123.234.56.78/spamdir). The web > system would track submissions, and create a ruleset from them. > > Initial score on first submission would be 0.1, with score increasing > perhaps to 1.0 as additional submissions/reports come in. We could also > have password-authorized trusted submitters, whose submissions would > score higher (allowing scores to get up to 2.5 perhaps). > > Perhaps these scores would be doubled for those systems not using DNSBLs? > > The system would then dump these scores into an ASCII file that could be > retrieved by anonymous FTP. This file could be stored as auto-uribl.cf > for those who can have multiple local.cf files, and could be > automatically added to the user_prefs file for people like me who are > limited to the user_prefs file. (Such rules wouldn't do any good unless > you use a system like mine that calls SA a second time.)
Already thought of doing that, in fact I'm already distributing rules to my customers machines by http/wget, but the idea of automating it and opening it up to public submission has a couple of problems. How do you verify / trust a submission? How do you know that a rule is right? You can't verify them all by hand, spammers can register domains between them faster than a person can make rules. If you let anybody submit a domain name then you end up with false positives from ppl misreading [EMAIL PROTECTED] and stupid redirectors, so that means archiving the spam in case you need to verify a rule, and possibly a lot of time & hassle verifying rules. if you try to match the whole address then spammers will use wildcard hosts and large random strings, if you match the domain name only you'll get all the free webhosting providers. The only solutions to this which I could think of are: Automated solution. Some sort of recursive scoring routine, give 0.1 for domain, 0.2 for host.domain, 0.2 for domain/dir/ 0.3 for host.domain/dir/dir etc etc and just let the score mount up as it gets closer to a match but that could make for lots of processing, and a massive rulefile. or Manual solution. Have people submit entire spam messages, grep the urls out of them, sort them by frequency and make rules by hand for the top ones. That's basically what I do now, I get a gzipped mbox of spam once a week from each customer, run it through a script, and make a few rules(urls, subjects) from the results. Anyone have a better solution? > >>header L_s_CorelWPOffice Subject =~ > >>/(?:Corel|WordPerfect).{1,15}Office/i > > MK> More \b action, on general principle, although not strictly needed. > > Agreed. Thanks. > > CS> Yeah, I have the norton system works rule like this. If you don't use > CS> WP office, then by all means make a rule. But an ISP would shy away > CS> from this one. > > Actually, we DO use WP Office. And we frequently share files from WP > Office. But we don't refer to WP Office as such in subject headings. Just > like we don't name each other in subject headings either. > > As for an ISP, I would think it's still a valid rule; they'd just need to > be careful to score it low enough to be incremental rather than > definitional. No way is that a valid rule for an ISP to use. A good rule looks for something which only appears in spam, WPOffice probably appears in as much ham as spam. istr the subject touted some % discount or % off WPOffice /[%|\$].{4,20}(?:corel|wordperfect).(1,15)office/i /(?:corel|wordperfect).{1,15}office.{4,20}[%|\$]/i matches mention of % or $ with WPOffice, doesn't match "can you read wordperfect office files" and other such obvious fp fodder. Still notperfect, still not a rule for an ISP to use but a better rule than it was. > > >>header L_hr_lattelekom Received =~ /lattelekom\.net/ > > MK> Seems fine, although a bit of a duplication of effort with DNSBL's.. > MK> have you enabled them? > > DNSBLs are enabled by my host. I wouldn't be without them. > > This was a spam that didn't score from them -- apparently it's too new a > pathway. This should probably be given a temporary name/flag, and removed > once the DNSBLs catch up. Do they need to catch up or for someone to submit it? It won't get listed if nobody submits it, and if you submit it instead of writing a rule for it you'll never have to remove that rule if/when it becomes secure. One point I would like to make about all this rule-writing is documenting the rules you make, not just date stamping them. A couple of lines of comments reminding you why you made a rule is always a good thing, including the line you're matching from the original spam will help you improve the rule if the spammer morphs. -- Yorkshire Dave -- Scanned by MailScanner at wot.no-ip.com ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk