On Friday, February 17, 2006, 7:19:50 AM, Matt Kettler wrote: > Jeff Chan wrote: >> On Thursday, February 16, 2006, 9:13:36 PM, Matt Kettler wrote: >> >>> I'm only presenting evidence of accuracy problems in relation to why the >>> URIBLs collectively wield a great deal of power in SpamAssassin scoring. >>> I'm not really complaining about uribl.com, I'm complaining about URIBLs >>> as a whole. That's both uribl.com and surbl. Whenever I use the term >>> URIBL in all caps, I mean all URI dns-based blacklists. If you prefer, >>> I'll retract my uribl.com example, and point out that less than an hour >>> later, I got a ws.surbl.org FP. >>> >> >> There may be some value in not lumping together URIBL.com and >> SURBL.org lists. As you can see the performance of the lists are >> different, and the way they're created is different too. That >> makes it harder for us to respond to comments that seem to not >> take those differences into account. >> > Did you see Theo's test data from yesterday?
Yes. I was referring lumping URIBL.com with SURBL.org mostly. > 35.418 41.1930 0.0000 1.000 0.90 0.00 URIBL_JP_SURBL > 34.665 40.3177 0.0000 1.000 0.88 0.00 URIBL_SC_SURBL > 26.069 30.3204 0.0000 1.000 0.80 0.00 URIBL_AB_SURBL > 28.024 32.5464 0.2915 0.991 0.61 0.00 URIBL_OB_SURBL > 48.113 55.7492 1.2873 0.977 0.55 0.00 URIBL_BLACK > 0.293 0.3406 0.0000 1.000 0.47 0.00 URIBL_PH_SURBL > 0.000 0.0000 0.0000 0.500 0.42 0.00 URIBL_RED > 0.000 0.0000 0.0000 0.500 0.42 0.01 T_URIBL_XS_SURBL > 37.539 42.4763 7.2626 0.854 0.38 0.00 URIBL_WS_SURBL > 0.548 0.3446 1.7974 0.161 0.03 0.00 URIBL_GREY > I consider that "highly similar" for JP, SC, AB, OB and WS. As similar as 30 and 40, and 0, .3 and 7 are, I suppose. > Also, even if there are some differences, even 10% overlap would have > the effect I'm talking about. > I personally would like to see some statistics, but at this point, we > don't have any test data on this so we're arguing your theory vs mine. > I'd love to see some results for some meta tests: > meta SURBL_MULTI2 ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL + > URIBL_OB_SURBL+ URIBL_WS_SURBL) >2) > meta SURBL_MULTI3 ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL + > URIBL_OB_SURBL+ URIBL_WS_SURBL) >3) > meta SURBL_MULTI4 ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL + > URIBL_OB_SURBL+ URIBL_WS_SURBL) >4) > In particular, I'm concerned about the ham hits of even multi 2. I'd be concerned about it to, but it seldom seems to happen. > Theo? >>> 3) I'm even more concerned about the monoculure of the URIBLs. >>> >> >> I suppose it depends on your point of view. From my point of >> view the various lists are different in terms of sources and >> listing logic. As you can see from the results posted, they have >> fairly different performance in terms of spam and ham hits, but >> those measurements don't take into account the underlying >> tools and sources that go into making them, which varies between >> lists. >> > I don't see the difference from the recent results posted by Theo. That's like saying two different RBLs that hit a similar percentage of spams must therefore have the same policies, even when they may have no data in common. It's not a conclusion that can be drawn from that kind of measurement. >>> uribl.com's black, surbl.org's ws, sc, jp, ab and ob are all >>> more-or-less the same list. Paul argued against that statement, but in >>> my mind his arguments are weak at best. There IS considerable overlap >>> between these lists. Contrary Paul's statements, you only need to be >>> reported once by a spamcop spamtrap or trusted feed to be on SC. >>> >> >> That's only partially correct. Paul's statement is correct for >> most SpamCop reports. It takes many reports to get on SC for >> most domains except the ones that resolve into known spammer >> networks. >> >> There are no "trusted feeds" for SC. > Not on your end, but keep in mind that spamcop trusts their spamtraps > with a 5x bias. Our feeds are SpamCop user and mole reports, not SpamCop trap data. >> The data in SC comes from >> SpamCop reports. I don't know the number of SpamCop users, but >> they're probably many. The way I deal with the issue of trust is >> to aggregate the reports in various ways and ignore some of the >> noise that would lead to FPs. And all SURBL lists are subject to >> whitelisting as a final arbiter. So even if a SpamCop user >> wanted us to blacklist say google.com or yahoo.com, we won't. >> >> >>> JP >>> monitors 18,000 domains, not just two people. AB accepts feeds directly >>> from spamcop and does different analysis on them. Ultimately it is >>> possible for a single copy of an email to cause a listing in >>> uribl_black, SC, WS, JP, and OB all at the same time. >>> >> >> Not really. It take a fairly large and widespread spam run to >> get onto multiple (SURBL) lists. > So why do so some small-spread legitamate mailings with special-purpose > domains end up multi-listed? I've seen this happen a number of times in > the past 3 weeks. This *IS* real. > It's not terribly common in terms of % of email, but maybe 1 in 1000 ham > mails I get has a double-listed link in it. I don't know. It's hard to consider in the abstract. Perhaps you'd care to name an example. Cheers, Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/