--As of September 27, 2006 5:43:28 PM -0700, Kelson is alleged to have said:
Daniel T. Staal wrote:
True. So... Optimal is obviously to train, once and correctly, on all
messages. Sending a message through that has been trained will consume
*some* resources, but less then one that still needs to be learned.
So the exact balance is a complicated question. ;)
I just train on everything. If it's already learned from a message, it
takes a few resources for it to recognize that, but almost certainly less
time than it would have taken me to separate them out.
--As for the rest, it is mine.
Depends on the setup. For instance, given the explanations above, I'll
start a system to automatically learn from my 'checkspam' folder, but not
my 'highspam' folder. I have procmail automatically sort my spam by score,
so I can pay extra attention to low-scoring spam. (Which is more likely to
be ham which was misplaced than the high-scoring spam.)
So, since I *already* have them separated out, I can avoid the
double-check. ;)
Anyway, I just knew that there was an automatic system, and at the very
least there is *some* load to re-learning, even if a full analysis is
skipped. It would be interesting to see how much it actually is, compared
to an easy filter. If I find time, I may try to figure out a good test.
Daniel T. Staal
---------------------------------------------------------------
This email copyright the author. Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes. This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------