Re: Re[2]: [SAtalk] [RD] Rule Philosophy

Yorkshire Dave Fri, 08 Aug 2003 03:18:15 -0700

On Wed, 2003-08-06 at 05:39, Robert Menschel wrote:

> CS> For those who may not see it, these are not the sender of the spam
> CS> domains, but the domain of the image hosts, often owned by spammers.
> CS> Therefore it is ever changing like a RBL. So submissions of these to
> CS> the Rule Emporium would be tooo lengthy. You would almost have to
> CS> have an RBL for the rule :)
> 
> We could, however, set up a blacklist through a website, such that
> anyone can submit an entry, a simple domain name such as time4more.net,
> or an IP address if that's the reference in the spam, or a more specific
> URI (spaml3.time4more.net/spamdir or 123.234.56.78/spamdir). The web
> system would track submissions, and create a ruleset from them.
> 
> Initial score on first submission would be 0.1, with score increasing
> perhaps to 1.0 as additional submissions/reports come in. We could also
> have password-authorized trusted submitters, whose submissions would
> score higher (allowing scores to get up to 2.5 perhaps).
> 
> Perhaps these scores would be doubled for those systems not using DNSBLs?
> 
> The system would then dump these scores into an ASCII file that could be
> retrieved by anonymous FTP. This file could be stored as auto-uribl.cf
> for those who can have multiple local.cf files, and could be
> automatically added to the user_prefs file for people like me who are
> limited to the user_prefs file. (Such rules wouldn't do any good unless
> you use a system like mine that calls SA a second time.)


Already thought of doing that, in fact I'm already distributing rules to
my customers machines by http/wget, but the idea of automating it and
opening it up to public submission has a couple of problems.

How do you verify / trust a submission? How do you know that a rule is
right?

You can't verify them all by hand, spammers can register domains between
them faster than a person can make rules.

If you let anybody submit a domain name then you end up with false
positives from ppl misreading [EMAIL PROTECTED] and stupid redirectors, so
that means archiving the spam in case you need to verify a rule, and
possibly a lot of time & hassle verifying rules.

if you try to match the whole address then spammers will use wildcard
hosts and large random strings, if you match the domain name only you'll
get all the free webhosting providers.

The only solutions to this which I could think of are: 

Automated solution.
Some sort of recursive scoring routine, give 0.1 for domain, 0.2 for
host.domain, 0.2 for domain/dir/  0.3 for host.domain/dir/dir etc etc
and just let the score mount up as it gets closer to a match but that
could make for lots of processing, and a massive rulefile.

or

Manual solution.
Have people submit entire spam messages, grep the urls out of them, sort
them by frequency and make rules by hand for the top ones. That's
basically what I do now, I get a gzipped mbox of spam once a week from
each customer, run it through a script, and make a few rules(urls,
subjects) from the results.

Anyone have a better solution?

> >>header    L_s_CorelWPOffice  Subject =~
> >>/(?:Corel|WordPerfect).{1,15}Office/i  
> 
> MK> More \b action, on general principle, although not strictly needed.
> 
> Agreed. Thanks.
> 
> CS> Yeah, I have the norton system works rule like this. If you don't use
> CS> WP office, then by all means make a rule. But an ISP would shy away
> CS> from this one.
> 
> Actually, we DO use WP Office. And we frequently share files from WP
> Office. But we don't refer to WP Office as such in subject headings. Just
> like we don't name each other in subject headings either.
> 
> As for an ISP, I would think it's still a valid rule; they'd just need to
> be careful to score it low enough to be incremental rather than
> definitional.

No way is that a valid rule for an ISP to use. A good rule looks for
something which only appears in spam, WPOffice probably appears in as
much ham as spam.

istr the subject touted some % discount or % off WPOffice

/[%|\$].{4,20}(?:corel|wordperfect).(1,15)office/i

/(?:corel|wordperfect).{1,15}office.{4,20}[%|\$]/i

matches mention of % or $ with WPOffice, 
doesn't match "can you read wordperfect office files" and other such
obvious fp fodder. Still notperfect, still not a rule for an ISP to use
but a better rule than it was.

> 
> >>header    L_hr_lattelekom  Received =~ /lattelekom\.net/
> 
> MK> Seems fine, although a bit of a duplication of effort with DNSBL's..
> MK> have you enabled them?
> 
> DNSBLs are enabled by my host. I wouldn't be without them.
> 
> This was a spam that didn't score from them -- apparently it's too new a
> pathway. This should probably be given a temporary name/flag, and removed
> once the DNSBLs catch up.

Do they need to catch up or for someone to submit it? It won't get
listed if nobody submits it, and if you submit it instead of writing a
rule for it you'll never have to remove that rule if/when it becomes
secure.

One point I would like to make about all this rule-writing is
documenting the rules you make, not just date stamping them. A couple of
lines of comments reminding you why you made a rule is always a good
thing, including the line you're matching from the original spam will
help you improve the rule if the spammer morphs.
  
-- 
Yorkshire Dave


-- 
Scanned by MailScanner at wot.no-ip.com



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: Re[2]: [SAtalk] [RD] Rule Philosophy

Reply via email to