"Loren Wilton" <[EMAIL PROTECTED]> writes:

> I'm trying to come up with a way to detect bogus end tags, and so far I'm
> not having much luck.

3.0 will have a test for this, although it just looks for tags in
general.  Someone could try enhancing the test to also notice when an
end element is used for a start-only tag.

These two tests overlap quite a bit, I'm leaving it to the score
optimizer to figure out the final scores...

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
 387188   306777    80411    0.792   0.00    0.00  (all messages)
100.000  79.2321  20.7679    0.792   0.00    0.00  (all messages as %)
  2.305   2.7818   0.4863    0.851   0.58    1.00  HTML_BADTAG_00_10
  4.236   5.3309   0.0597    0.989   0.92    1.00  HTML_BADTAG_10_20
  1.165   1.4678   0.0087    0.994   0.93    1.00  HTML_BADTAG_20_30
  1.917   2.4171   0.0075    0.997   0.93    1.00  HTML_BADTAG_30_40
 15.944  20.1234   0.0000    1.000   0.97    1.00  HTML_BADTAG_40_50
  0.454   0.5731   0.0000    1.000   0.94    1.00  HTML_BADTAG_50_60
  1.023   1.2915   0.0000    1.000   0.94    1.00  HTML_BADTAG_60_70
  0.367   0.4635   0.0000    1.000   0.94    1.00  HTML_BADTAG_70_80
  0.127   0.1604   0.0000    1.000   0.94    1.00  HTML_BADTAG_80_90
  0.015   0.0186   0.0000    1.000   0.94    1.00  HTML_BADTAG_90_100
  1.312   1.5777   0.2985    0.841   0.56    1.00  HTML_NONELEMENT_00_10
  0.600   0.7168   0.1530    0.824   0.53    1.00  HTML_NONELEMENT_10_20
  0.598   0.7409   0.0510    0.936   0.77    1.00  HTML_NONELEMENT_20_30
  2.936   3.6994   0.0236    0.994   0.93    1.00  HTML_NONELEMENT_30_40
  0.966   1.2159   0.0112    0.991   0.92    1.00  HTML_NONELEMENT_40_50
 15.548  19.6214   0.0075    1.000   0.97    1.00  HTML_NONELEMENT_50_60
  1.477   1.8626   0.0037    0.998   0.94    1.00  HTML_NONELEMENT_60_70
  1.409   1.7749   0.0112    0.994   0.92    1.00  HTML_NONELEMENT_70_80
  1.556   1.9627   0.0025    0.999   0.94    1.00  HTML_NONELEMENT_80_90
  1.153   1.4558   0.0000    1.000   0.94    1.00  HTML_NONELEMENT_90_100

For HTML_MESSAGE messages only:

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
 265707   260392     5315    0.980   0.00    0.00  (all messages)
100.000  97.9997   2.0003    0.980   0.00    0.00  (all messages as %)
  3.359   3.2774   7.3565    0.308   0.03    1.00  HTML_BADTAG_00_10
  6.173   6.2805   0.9031    0.874   0.65    1.00  HTML_BADTAG_10_20
  1.697   1.7293   0.1317    0.929   0.77    1.00  HTML_BADTAG_20_30
  2.793   2.8476   0.1129    0.962   0.86    1.00  HTML_BADTAG_30_40
 23.234  23.7081   0.0000    1.000   0.99    1.00  HTML_BADTAG_40_50
  0.662   0.6751   0.0000    1.000   0.96    1.00  HTML_BADTAG_50_60
  1.491   1.5216   0.0000    1.000   0.96    1.00  HTML_BADTAG_60_70
  0.535   0.5461   0.0000    1.000   0.96    1.00  HTML_BADTAG_70_80
  0.185   0.1889   0.0000    1.000   0.96    1.00  HTML_BADTAG_80_90
  0.021   0.0219   0.0000    1.000   0.96    1.00  HTML_BADTAG_90_100
  1.912   1.8587   4.5155    0.292   0.03    1.00  HTML_NONELEMENT_00_10
  0.874   0.8445   2.3142    0.267   0.02    1.00  HTML_NONELEMENT_10_20
  0.871   0.8729   0.7714    0.531   0.14    1.00  HTML_NONELEMENT_20_30
  4.278   4.3584   0.3575    0.924   0.76    1.00  HTML_NONELEMENT_30_40
  1.407   1.4325   0.1693    0.894   0.69    1.00  HTML_NONELEMENT_40_50
 22.657  23.1167   0.1129    0.995   0.98    1.00  HTML_NONELEMENT_50_60
  2.152   2.1944   0.0564    0.975   0.89    1.00  HTML_NONELEMENT_60_70
  2.053   2.0911   0.1693    0.925   0.76    1.00  HTML_NONELEMENT_70_80
  2.267   2.3123   0.0376    0.984   0.91    1.00  HTML_NONELEMENT_80_90
  1.681   1.7151   0.0000    1.000   0.96    1.00  HTML_NONELEMENT_90_100

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Reply via email to