On Fri, 27 Feb 2009, Savoy, Jim wrote:

0.000          0     206774          0  non-token data: nspam
0.000          0    1515235          0  non-token data: nham

John Hardin wrote:

I got the impression that the goal was to have a ratio that roughly
reflected the spam:ham ratio of your raw mail stream.

These two figures used to be closer together (with nspam being a much greater number), but we made a couple of changes last fall. First, I wiped the db clean and re-trained it from scratch with 200 spam and 200 ham. Then we added spamhaus checks, which drop (at smtp time) messages before they ever even get to SpamAssassin. It looks like about 75% of all incoming mail is now being dropped, and what does get through is usually good, thus the new ratio.

Note I said "raw"; by that I meant "before any filtering". Also, I was speaking about manual training, though I could see where autolearn might lead to the above ratio.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  The first time I saw a bagpipe, I thought the player was torturing
  an octopus. I was amazed they could scream so loudly.
                                        -- cat_herder_5263 on Y! SCOX
-----------------------------------------------------------------------
 14 days until Albert Einstein's 130th Birthday

Reply via email to