On Fri, 03 Sep 2004, Joe Flowers yowled: > When I say "ham and spam curves", I'm envisioning 2 bell curves on the same > graph, significantly separated - I hope, and SA > automatically/continually keeping "5.0" sitting right in the middle between > their peaks.
The GA (in 2.x) or perceptron (in 3.x) tries to tune the rule scores so that non-spam stays below a score of 5.0, and spam stays above that score. They are definitely significantly separated, but the tails inevitably overlap: the scorer treats spam tails in the nonspam range as far preferable to nonspam tails in the spam range. There's no `continually', though: the GA took days to run, so it was done in a big splurge just before x.x0 releases. (The perceptron is vastly faster: there's talk of daily rule scoring runs and things like that now. Individual scoring is now impractical only because most people don't have a large enough ham and spam corpus to score as accurately as the release process does.) -- `The copyright file is for everyone. That we make it available in plain-text, uncompressed form rather than in spinning, throbbing OpenGL-rendered 3D text over a thumping dance music soundtrack is a feature, not a bug.' --- Branden Robinson
