On Fri, 03 Sep 2004, Joe Flowers yowled:
> When I say "ham and spam curves", I'm envisioning 2 bell curves on the same 
> graph, significantly separated - I hope, and SA
> automatically/continually keeping "5.0" sitting right in the middle between 
> their peaks.

The GA (in 2.x) or perceptron (in 3.x) tries to tune the rule scores so
that non-spam stays below a score of 5.0, and spam stays above that
score.

They are definitely significantly separated, but the tails inevitably
overlap: the scorer treats spam tails in the nonspam range as far
preferable to nonspam tails in the spam range.

There's no `continually', though: the GA took days to run, so it
was done in a big splurge just before x.x0 releases.

(The perceptron is vastly faster: there's talk of daily rule scoring
runs and things like that now. Individual scoring is now impractical
only because most people don't have a large enough ham and spam corpus
to score as accurately as the release process does.)

-- 
`The copyright file is for everyone.  That we make it available in
 plain-text, uncompressed form rather than in spinning, throbbing
 OpenGL-rendered 3D text over a thumping dance music soundtrack is a
 feature, not a bug.' --- Branden Robinson

Reply via email to