On Fri, 14 Nov 2003, Carl R. Friend wrote: > For the assembled group -- is it possible to do a DB lookup, > either in an eval() or some other mechanism, in a "uri" rule? > If we could do a DB lookup on URIs (or, more properly, the > domain portion of URIs) I think that'd be a win (at, of course, > the expense in human time). >
I've been thinking about that exact topic. The Bayes engine already parses and tokenizes hostnames from URIs (the UD: tokens). If there were a hash DB made with the spam-site hostname as key and score,description as value (something like the sendmail access db) then it should be pretty easy to take those UD: tokens and do a lookup and add results to total score. It would be much faster and use less memory than the various collections of regex spam-host rules that have been discussed here (such as William's or Chris's "evilrules" ;). (I have a 15,000 line sendmail access db that doesn't bother it a bit ;). Another advantage is that it would be possible to update the database 'hot' (IE without having to kill and restart spamd, the way that you have to do to update regex rules). It might even be possible to automate the updating of the database. (take hostnames found by Bayes in spam, do DNS lookup and add if IP in spamhaus nets, in trusted DSBLs, has short TTL, etc). I can see one of two different implementations: 1) Have the value be just "score,description" and synthesize the rule name from the hostname (EG: spammer.com 1.2,Spamhause business site -> rule == L_URI_SPAMMER_COM score == 1.2 description == "Spamhause business site" 2) have the value be a triple, "name,score,description" and explicitly store all attributes: spammer.com MEDS_SPAMHAUS,1.2,Spamhause business site 1) would be simpler to update and use up less memory, 2) would be more flexible and let you combine several different sites into one class of rule. Asumption, once a host matches a rule, subsequent matches on that rule name would be ignored. Probably should also add some kind of time-stamp to each entry to facilitate automated updating. -- Dave Funk University of Iowa <dbfunk (at) engineering.uiowa.edu> College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include <std_disclaimer.h> Better is not better, 'standard' is better. B{ ------------------------------------------------------- This SF. Net email is sponsored by: GoToMyPC GoToMyPC is the fast, easy and secure way to access your computer from any Web browser or wireless device. Click here to Try it Free! https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk