> -----Original Message----- > From: Matt Kettler [mailto:[EMAIL PROTECTED] > Sent: Friday, February 17, 2006 18:47 > To: Matt Kettler > Cc: Jeff Chan; users@spamassassin.apache.org > Subject: Re: Over-scoring of SURBL lists... > > Matt Kettler wrote: > > > I'll even re-quote myself: > >> I personally would like to see some statistics, but at > this point, > >> we don't have any test data on this so we're arguing your > theory vs mine. > > And your quote that I was counter-pointing: > >> As you can see the performance of the lists are different, > and the way they're created is different too. > > > > I don't see enough of a difference to clearly rule out > significant overlap. > > > > I'll define my test of "significant overlap" as: > >> 10% of total hits redundant across 3 or more lists and >1% nonspam > >> hits > > redundant across 2 or more lists. > > > > Messages received today that are double-listed in two or more > of SC, JP, AB, OB and WS: > grep "SURBL_MULTI2" /var/log/maillog |grep "Feb 17" |wc -l > 292 > > All surbl.org hits in same timeframe (includes ph, but no matter): > > grep "_SURBL" /var/log/maillog |grep "Feb 17" |wc -l > 583 > > So we at least have a 50% double-listing rate. That > in-and-of-itself isn't much of a problem, but it also doesn't > rule out overlap. It's still a whole lot higher than my first > criteria of 10% overlap > > However, right now I don't have more than 100 FPs so I can't > really comment on the nonspam hit rate of SURBL_MULTI2. > That's the important one. > > I also added multi3, multi4 and another rule to detect > overlap between uribl.com's black and surbl.org: > > meta URIBL_BLACK_OVERLAP (URIBL_BLACK && (URIBL_AB_SURBL || > URIBL_JP_SURBL || URIBL_OB_SURBL || URIBL_WS_SURBL || > URIBL_SC_SURBL)) score URIBL_BLACK_OVERLAP -1.0 >
if anyone is interested, here is an alternative scoring method for 25_uribl.cf -> http://www.uribl.com/tools/25_uribl.cf (make sure you wipe out the scores for uribl tests in 50_scores.cf if you replace this file). This should make SBL/URIBL/SURBL hits range in score from 2.0 to 5.5... - 2.0 (SBL ONLY) - 2.5 (URIBL_ONLY) - 2.5 (SURBL_ONLY) - 3.0 (SBL + URIBL) - 3.0 (SBL + SURBL) - 3.0 (SURBL_ONLY x2) - 4.0 (URIBL + SURBL) - 5.0 (SBL + URIBL + SURBL) - 5.5 (SBL + URIBL + SURBLx2) If you want to reduce the possibility of URIBL-only FPs, this is the way to go. D