On Fri, 27 Feb 2009, Savoy, Jim wrote:
0.000 0 206774 0 non-token data: nspam
0.000 0 1515235 0 non-token data: nham
John Hardin wrote:
I got the impression that the goal was to have a ratio that roughly
reflected the spam:ham ratio of your raw mail stream.
These two figures used to be closer together (with nspam being a much
greater number), but we made a couple of changes last fall. First, I
wiped the db clean and re-trained it from scratch with 200 spam and 200
ham. Then we added spamhaus checks, which drop (at smtp time) messages
before they ever even get to SpamAssassin. It looks like about 75% of
all incoming mail is now being dropped, and what does get through is
usually good, thus the new ratio.
Note I said "raw"; by that I meant "before any filtering". Also, I was
speaking about manual training, though I could see where autolearn might
lead to the above ratio.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The first time I saw a bagpipe, I thought the player was torturing
an octopus. I was amazed they could scream so loudly.
-- cat_herder_5263 on Y! SCOX
-----------------------------------------------------------------------
14 days until Albert Einstein's 130th Birthday