"Loren Wilton" <[EMAIL PROTECTED]> writes: > I'm trying to come up with a way to detect bogus end tags, and so far I'm > not having much luck.
3.0 will have a test for this, although it just looks for tags in general. Someone could try enhancing the test to also notice when an end element is used for a start-only tag. These two tests overlap quite a bit, I'm leaving it to the score optimizer to figure out the final scores... OVERALL% SPAM% HAM% S/O RANK SCORE NAME 387188 306777 80411 0.792 0.00 0.00 (all messages) 100.000 79.2321 20.7679 0.792 0.00 0.00 (all messages as %) 2.305 2.7818 0.4863 0.851 0.58 1.00 HTML_BADTAG_00_10 4.236 5.3309 0.0597 0.989 0.92 1.00 HTML_BADTAG_10_20 1.165 1.4678 0.0087 0.994 0.93 1.00 HTML_BADTAG_20_30 1.917 2.4171 0.0075 0.997 0.93 1.00 HTML_BADTAG_30_40 15.944 20.1234 0.0000 1.000 0.97 1.00 HTML_BADTAG_40_50 0.454 0.5731 0.0000 1.000 0.94 1.00 HTML_BADTAG_50_60 1.023 1.2915 0.0000 1.000 0.94 1.00 HTML_BADTAG_60_70 0.367 0.4635 0.0000 1.000 0.94 1.00 HTML_BADTAG_70_80 0.127 0.1604 0.0000 1.000 0.94 1.00 HTML_BADTAG_80_90 0.015 0.0186 0.0000 1.000 0.94 1.00 HTML_BADTAG_90_100 1.312 1.5777 0.2985 0.841 0.56 1.00 HTML_NONELEMENT_00_10 0.600 0.7168 0.1530 0.824 0.53 1.00 HTML_NONELEMENT_10_20 0.598 0.7409 0.0510 0.936 0.77 1.00 HTML_NONELEMENT_20_30 2.936 3.6994 0.0236 0.994 0.93 1.00 HTML_NONELEMENT_30_40 0.966 1.2159 0.0112 0.991 0.92 1.00 HTML_NONELEMENT_40_50 15.548 19.6214 0.0075 1.000 0.97 1.00 HTML_NONELEMENT_50_60 1.477 1.8626 0.0037 0.998 0.94 1.00 HTML_NONELEMENT_60_70 1.409 1.7749 0.0112 0.994 0.92 1.00 HTML_NONELEMENT_70_80 1.556 1.9627 0.0025 0.999 0.94 1.00 HTML_NONELEMENT_80_90 1.153 1.4558 0.0000 1.000 0.94 1.00 HTML_NONELEMENT_90_100 For HTML_MESSAGE messages only: OVERALL% SPAM% HAM% S/O RANK SCORE NAME 265707 260392 5315 0.980 0.00 0.00 (all messages) 100.000 97.9997 2.0003 0.980 0.00 0.00 (all messages as %) 3.359 3.2774 7.3565 0.308 0.03 1.00 HTML_BADTAG_00_10 6.173 6.2805 0.9031 0.874 0.65 1.00 HTML_BADTAG_10_20 1.697 1.7293 0.1317 0.929 0.77 1.00 HTML_BADTAG_20_30 2.793 2.8476 0.1129 0.962 0.86 1.00 HTML_BADTAG_30_40 23.234 23.7081 0.0000 1.000 0.99 1.00 HTML_BADTAG_40_50 0.662 0.6751 0.0000 1.000 0.96 1.00 HTML_BADTAG_50_60 1.491 1.5216 0.0000 1.000 0.96 1.00 HTML_BADTAG_60_70 0.535 0.5461 0.0000 1.000 0.96 1.00 HTML_BADTAG_70_80 0.185 0.1889 0.0000 1.000 0.96 1.00 HTML_BADTAG_80_90 0.021 0.0219 0.0000 1.000 0.96 1.00 HTML_BADTAG_90_100 1.912 1.8587 4.5155 0.292 0.03 1.00 HTML_NONELEMENT_00_10 0.874 0.8445 2.3142 0.267 0.02 1.00 HTML_NONELEMENT_10_20 0.871 0.8729 0.7714 0.531 0.14 1.00 HTML_NONELEMENT_20_30 4.278 4.3584 0.3575 0.924 0.76 1.00 HTML_NONELEMENT_30_40 1.407 1.4325 0.1693 0.894 0.69 1.00 HTML_NONELEMENT_40_50 22.657 23.1167 0.1129 0.995 0.98 1.00 HTML_NONELEMENT_50_60 2.152 2.1944 0.0564 0.975 0.89 1.00 HTML_NONELEMENT_60_70 2.053 2.0911 0.1693 0.925 0.76 1.00 HTML_NONELEMENT_70_80 2.267 2.3123 0.0376 0.984 0.91 1.00 HTML_NONELEMENT_80_90 1.681 1.7151 0.0000 1.000 0.96 1.00 HTML_NONELEMENT_90_100 Daniel -- Daniel Quinlan anti-spam (SpamAssassin), Linux, http://www.pathname.com/~quinlan/ and open source consulting
