Matt Kettler wrote:
> Jeff Chan wrote:
>   
>> There may be some value in not lumping together URIBL.com and
>> SURBL.org lists.  As you can see the performance of the lists are
>> different, and the way they're created is different too.  That
>> makes it harder for us to respond to comments that seem to not
>> take those differences into account.  
>>     
> Did you see Theo's test data from yesterday?
>
>  35.418  41.1930   0.0000    1.000   0.90    0.00  URIBL_JP_SURBL
>  34.665  40.3177   0.0000    1.000   0.88    0.00  URIBL_SC_SURBL
>  26.069  30.3204   0.0000    1.000   0.80    0.00  URIBL_AB_SURBL
>  28.024  32.5464   0.2915    0.991   0.61    0.00  URIBL_OB_SURBL
>  48.113  55.7492   1.2873    0.977   0.55    0.00  URIBL_BLACK
>   0.293   0.3406   0.0000    1.000   0.47    0.00  URIBL_PH_SURBL
>   0.000   0.0000   0.0000    0.500   0.42    0.00  URIBL_RED
>   0.000   0.0000   0.0000    0.500   0.42    0.01  T_URIBL_XS_SURBL
>  37.539  42.4763   7.2626    0.854   0.38    0.00  URIBL_WS_SURBL
>   0.548   0.3446   1.7974    0.161   0.03    0.00  URIBL_GREY
>
> I consider that "highly similar" for JP, SC, AB, OB and WS.
>
> Also, even if there are some differences, even 10% overlap would have
> the effect I'm talking about.
>
> I personally would like to see some statistics, but  at this point, we
> don't have any test data on this so we're arguing your theory vs mine.
>
> I'd love to see some results for some meta tests:
>
> meta SURBL_MULTI2   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
> URIBL_OB_SURBL+  URIBL_WS_SURBL) >2)
> meta SURBL_MULTI3   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
> URIBL_OB_SURBL+  URIBL_WS_SURBL) >3)
> meta SURBL_MULTI4   ((URIBL_JP_SURBL + URIBL_SC_SURBL + URIBL_AB_SURBL +
> URIBL_OB_SURBL+  URIBL_WS_SURBL) >4)
>   
I whipped up a short script to calculate these stats on my spam corpus
(realtime data).  First of all, the hit rate is quite impressive.  The
last 3 months I had 67%, 74% and 72% hit rates.  However, it looks like
about 45-50% of the spam hit 4 or 5 SURBL lists.

My ham corpus looked clean of URIBL hits.  Sorry for the ugly formatting. 

Note: the month buckets listed aren't exactly accurate because they use
the Date header sent from the spammer, not the Date received header. 
This should be good enough to get an idea though.


Chris Thielen

Stats for SPAM 38 months old:
0: 98.5% ( 268 / 272 )
1: 0.0% ( 0 / 272 )
2: 0.0% ( 0 / 272 )
3: 0.7% ( 2 / 272 )
4: 0.0% ( 0 / 272 )
5: 0.7% ( 2 / 272 )
6: 0.0% ( 0 / 272 )
Stats for SPAM 37 months old:
0: 96.6% ( 281 / 291 )
1: 0.7% ( 2 / 291 )
2: 0.0% ( 0 / 291 )
3: 0.3% ( 1 / 291 )
4: 1.4% ( 4 / 291 )
5: 1.0% ( 3 / 291 )
6: 0.0% ( 0 / 291 )
Stats for SPAM 36 months old:
0: 96.5% ( 277 / 287 )
1: 0.7% ( 2 / 287 )
2: 0.3% ( 1 / 287 )
3: 0.3% ( 1 / 287 )
4: 1.0% ( 3 / 287 )
5: 1.0% ( 3 / 287 )
6: 0.0% ( 0 / 287 )
Stats for SPAM 35 months old:
0: 97.5% ( 234 / 240 )
1: 0.4% ( 1 / 240 )
2: 0.4% ( 1 / 240 )
3: 0.0% ( 0 / 240 )
4: 0.8% ( 2 / 240 )
5: 0.8% ( 2 / 240 )
6: 0.0% ( 0 / 240 )
Stats for SPAM 34 months old:
0: 39.5% ( 118 / 299 )
1: 11.7% ( 35 / 299 )
2: 11.7% ( 35 / 299 )
3: 11.0% ( 33 / 299 )
4: 25.8% ( 77 / 299 )
5: 0.3% ( 1 / 299 )
6: 0.0% ( 0 / 299 )
Stats for SPAM 33 months old:
0: 24.0% ( 76 / 317 )
1: 20.8% ( 66 / 317 )
2: 11.7% ( 37 / 317 )
3: 12.0% ( 38 / 317 )
4: 30.9% ( 98 / 317 )
5: 0.6% ( 2 / 317 )
6: 0.0% ( 0 / 317 )
Stats for SPAM 32 months old:
0: 23.6% ( 66 / 280 )
1: 18.2% ( 51 / 280 )
2: 13.6% ( 38 / 280 )
3: 13.2% ( 37 / 280 )
4: 30.7% ( 86 / 280 )
5: 0.7% ( 2 / 280 )
6: 0.0% ( 0 / 280 )
Stats for SPAM 31 months old:
0: 27.4% ( 80 / 292 )
1: 9.2% ( 27 / 292 )
2: 10.6% ( 31 / 292 )
3: 19.9% ( 58 / 292 )
4: 32.9% ( 96 / 292 )
5: 0.0% ( 0 / 292 )
6: 0.0% ( 0 / 292 )
Stats for SPAM 30 months old:
0: 27.4% ( 83 / 303 )
1: 14.9% ( 45 / 303 )
2: 14.9% ( 45 / 303 )
3: 10.6% ( 32 / 303 )
4: 32.3% ( 98 / 303 )
5: 0.0% ( 0 / 303 )
6: 0.0% ( 0 / 303 )
Stats for SPAM 29 months old:
0: 27.1% ( 82 / 303 )
1: 13.5% ( 41 / 303 )
2: 11.6% ( 35 / 303 )
3: 15.8% ( 48 / 303 )
4: 19.8% ( 60 / 303 )
5: 12.2% ( 37 / 303 )
6: 0.0% ( 0 / 303 )
Stats for SPAM 28 months old:
0: 14.4% ( 40 / 277 )
1: 11.9% ( 33 / 277 )
2: 17.7% ( 49 / 277 )
3: 15.2% ( 42 / 277 )
4: 16.6% ( 46 / 277 )
5: 24.2% ( 67 / 277 )
6: 0.0% ( 0 / 277 )
Stats for SPAM 27 months old:
0: 18.3% ( 56 / 306 )
1: 9.2% ( 28 / 306 )
2: 18.6% ( 57 / 306 )
3: 15.4% ( 47 / 306 )
4: 13.7% ( 42 / 306 )
5: 24.8% ( 76 / 306 )
6: 0.0% ( 0 / 306 )
Stats for SPAM 26 months old:
0: 21.8% ( 49 / 225 )
1: 10.2% ( 23 / 225 )
2: 20.0% ( 45 / 225 )
3: 14.2% ( 32 / 225 )
4: 12.0% ( 27 / 225 )
5: 21.8% ( 49 / 225 )
6: 0.0% ( 0 / 225 )
Stats for SPAM 25 months old:
0: 22.2% ( 59 / 266 )
1: 13.9% ( 37 / 266 )
2: 19.2% ( 51 / 266 )
3: 13.2% ( 35 / 266 )
4: 18.0% ( 48 / 266 )
5: 13.5% ( 36 / 266 )
6: 0.0% ( 0 / 266 )
Stats for SPAM 24 months old:
0: 20.4% ( 51 / 250 )
1: 13.2% ( 33 / 250 )
2: 17.6% ( 44 / 250 )
3: 16.8% ( 42 / 250 )
4: 14.0% ( 35 / 250 )
5: 18.0% ( 45 / 250 )
6: 0.0% ( 0 / 250 )
Stats for SPAM 23 months old:
0: 21.2% ( 60 / 283 )
1: 8.8% ( 25 / 283 )
2: 20.1% ( 57 / 283 )
3: 14.1% ( 40 / 283 )
4: 14.1% ( 40 / 283 )
5: 21.6% ( 61 / 283 )
6: 0.0% ( 0 / 283 )
Stats for SPAM 22 months old:
0: 17.3% ( 39 / 226 )
1: 8.8% ( 20 / 226 )
2: 15.0% ( 34 / 226 )
3: 21.7% ( 49 / 226 )
4: 14.2% ( 32 / 226 )
5: 23.0% ( 52 / 226 )
6: 0.0% ( 0 / 226 )
Stats for SPAM 21 months old:
0: 23.2% ( 38 / 164 )
1: 15.9% ( 26 / 164 )
2: 15.2% ( 25 / 164 )
3: 12.2% ( 20 / 164 )
4: 11.0% ( 18 / 164 )
5: 22.6% ( 37 / 164 )
6: 0.0% ( 0 / 164 )
Stats for SPAM 20 months old:
0: 19.5% ( 34 / 174 )
1: 9.8% ( 17 / 174 )
2: 21.8% ( 38 / 174 )
3: 6.9% ( 12 / 174 )
4: 11.5% ( 20 / 174 )
5: 30.5% ( 53 / 174 )
6: 0.0% ( 0 / 174 )
Stats for SPAM 19 months old:
0: 23.1% ( 49 / 212 )
1: 10.8% ( 23 / 212 )
2: 24.1% ( 51 / 212 )
3: 9.4% ( 20 / 212 )
4: 11.8% ( 25 / 212 )
5: 20.8% ( 44 / 212 )
6: 0.0% ( 0 / 212 )
Stats for SPAM 18 months old:
0: 21.2% ( 43 / 203 )
1: 12.3% ( 25 / 203 )
2: 16.7% ( 34 / 203 )
3: 18.2% ( 37 / 203 )
4: 8.4% ( 17 / 203 )
5: 23.2% ( 47 / 203 )
6: 0.0% ( 0 / 203 )
Stats for SPAM 17 months old:
0: 23.1% ( 45 / 195 )
1: 10.3% ( 20 / 195 )
2: 11.8% ( 23 / 195 )
3: 13.8% ( 27 / 195 )
4: 15.9% ( 31 / 195 )
5: 25.1% ( 49 / 195 )
6: 0.0% ( 0 / 195 )
Stats for SPAM 16 months old:
0: 25.0% ( 60 / 240 )
1: 12.5% ( 30 / 240 )
2: 21.7% ( 52 / 240 )
3: 10.0% ( 24 / 240 )
4: 10.4% ( 25 / 240 )
5: 20.4% ( 49 / 240 )
6: 0.0% ( 0 / 240 )
Stats for SPAM 15 months old:
0: 18.4% ( 47 / 256 )
1: 13.7% ( 35 / 256 )
2: 18.4% ( 47 / 256 )
3: 12.5% ( 32 / 256 )
4: 16.8% ( 43 / 256 )
5: 20.3% ( 52 / 256 )
6: 0.0% ( 0 / 256 )
Stats for SPAM 14 months old:
0: 30.5% ( 73 / 239 )
1: 8.8% ( 21 / 239 )
2: 13.0% ( 31 / 239 )
3: 10.9% ( 26 / 239 )
4: 11.7% ( 28 / 239 )
5: 25.1% ( 60 / 239 )
6: 0.0% ( 0 / 239 )
Stats for SPAM 13 months old:
0: 19.7% ( 43 / 218 )
1: 10.1% ( 22 / 218 )
2: 11.9% ( 26 / 218 )
3: 15.6% ( 34 / 218 )
4: 19.7% ( 43 / 218 )
5: 22.9% ( 50 / 218 )
6: 0.0% ( 0 / 218 )
Stats for SPAM 12 months old:
0: 24.9% ( 62 / 249 )
1: 14.9% ( 37 / 249 )
2: 11.6% ( 29 / 249 )
3: 11.2% ( 28 / 249 )
4: 18.1% ( 45 / 249 )
5: 19.3% ( 48 / 249 )
6: 0.0% ( 0 / 249 )
Stats for SPAM 11 months old:
0: 24.9% ( 53 / 213 )
1: 9.9% ( 21 / 213 )
2: 8.5% ( 18 / 213 )
3: 9.9% ( 21 / 213 )
4: 22.5% ( 48 / 213 )
5: 24.4% ( 52 / 213 )
6: 0.0% ( 0 / 213 )
Stats for SPAM 10 months old:
0: 18.0% ( 36 / 200 )
1: 11.5% ( 23 / 200 )
2: 9.0% ( 18 / 200 )
3: 12.5% ( 25 / 200 )
4: 29.0% ( 58 / 200 )
5: 20.0% ( 40 / 200 )
6: 0.0% ( 0 / 200 )
Stats for SPAM 9 months old:
0: 27.2% ( 59 / 217 )
1: 6.9% ( 15 / 217 )
2: 8.3% ( 18 / 217 )
3: 14.7% ( 32 / 217 )
4: 25.8% ( 56 / 217 )
5: 17.1% ( 37 / 217 )
6: 0.0% ( 0 / 217 )
Stats for SPAM 8 months old:
0: 25.1% ( 63 / 251 )
1: 6.0% ( 15 / 251 )
2: 12.4% ( 31 / 251 )
3: 9.6% ( 24 / 251 )
4: 24.3% ( 61 / 251 )
5: 22.7% ( 57 / 251 )
6: 0.0% ( 0 / 251 )
Stats for SPAM 7 months old:
0: 23.1% ( 54 / 234 )
1: 7.7% ( 18 / 234 )
2: 9.4% ( 22 / 234 )
3: 8.5% ( 20 / 234 )
4: 18.8% ( 44 / 234 )
5: 32.5% ( 76 / 234 )
6: 0.0% ( 0 / 234 )
Stats for SPAM 6 months old:
0: 29.8% ( 78 / 262 )
1: 5.0% ( 13 / 262 )
2: 9.2% ( 24 / 262 )
3: 13.0% ( 34 / 262 )
4: 21.4% ( 56 / 262 )
5: 21.8% ( 57 / 262 )
6: 0.0% ( 0 / 262 )
Stats for SPAM 5 months old:
0: 29.3% ( 65 / 222 )
1: 10.8% ( 24 / 222 )
2: 10.4% ( 23 / 222 )
3: 10.8% ( 24 / 222 )
4: 16.2% ( 36 / 222 )
5: 22.5% ( 50 / 222 )
6: 0.0% ( 0 / 222 )
Stats for SPAM 4 months old:
0: 19.6% ( 44 / 224 )
1: 4.9% ( 11 / 224 )
2: 3.6% ( 8 / 224 )
3: 12.5% ( 28 / 224 )
4: 23.7% ( 53 / 224 )
5: 35.7% ( 80 / 224 )
6: 0.0% ( 0 / 224 )
Stats for SPAM 3 months old:
0: 28.3% ( 77 / 272 )
1: 5.5% ( 15 / 272 )
2: 7.0% ( 19 / 272 )
3: 10.3% ( 28 / 272 )
4: 19.9% ( 54 / 272 )
5: 29.0% ( 79 / 272 )
6: 0.0% ( 0 / 272 )
Stats for SPAM 2 months old:
0: 26.0% ( 81 / 311 )
1: 8.4% ( 26 / 311 )
2: 6.1% ( 19 / 311 )
3: 8.7% ( 27 / 311 )
4: 18.0% ( 56 / 311 )
5: 32.8% ( 102 / 311 )
6: 0.0% ( 0 / 311 )
Stats for SPAM 1 months old:
0: 32.9% ( 120 / 365 )
1: 3.8% ( 14 / 365 )
2: 6.8% ( 25 / 365 )
3: 11.0% ( 40 / 365 )
4: 18.1% ( 66 / 365 )
5: 27.4% ( 100 / 365 )
6: 0.0% ( 0 / 365 )

Reply via email to