Jeff, I had a look at your list at some random time a few days ago. I
noticed that the top 90% or so of the reports looked pretty solid. At the
instant I looked, the bottom 10% of the reports were most all highly
suspect. This is where the yahoo and geocities and other whitelist stuff
was showing up. Some other reports (and I can't remember what any of them
were) also seemed somewhat suspect, even though they probably weren't on a
whitelist.
I concluded that only the top 90% of your reports should be used in the
blocking test, and ignore the reports with less than 10% of the
highest-scoring report. Now, perhaps this percentage fluxuates with time, I
certainly haven't made multiple checks to see. And maybe after whitelist
removal the rest of the bottom 10% really is spam.
But I think it would be an interesting experiment to compare the relibility
of the top 90% to the relibility of the entire collection.
Loren