Theo Van Dinter wrote:
On Tue, Mar 07, 2006 at 08:44:59PM -0500, Gabriel M. Wachman wrote:
The perceptron (form of neural net used in SA 3.0.0 and higher) is used by the
developers to generate the scores prior to release. 99.9% of end-users do not
ever use the perceptron.
By "do not use" do you mean that it is completely ignored during
classification, or that only the fixed pre-trained neural net is used
The output from the perceptron are scores (weights) which are used during
classification. As Matt said, users tend not to generate their own scores,
and so therefore don't run the perceptron, they just use the output from when
it's run pre-release.
OK, I think I see where the confusion is; is it a perceptron or a neural
net? For anyone who doesn't know, a perceptron is a single element
neural net if one wanted to call it that, but really it's just a linear
classifier. There are two reasons why it seems highly unlikely that
SpamAssassin was trained on a neural net. 1) Back-propagation is an
algorithm used on multi-layer neural nets and so does not really make
sense in the context of training a perceptron (there's nothing to
back-propagate to). 2) You can't save "scores" from a multilayer neural
net as "if feature X is 1, add Y to the score." Neural nets compute
complex functions that aren't simple conjunctions of features (and if
they are simple conjunctions of features, just use a perceptron). That
may be the crux of my confusion, since if there is a neural net
somewhere, it needs to be running inside SpamAssassin during
classification (even if it does not update itself). If it's just a
perceptron, then I see how this works.
The motivation for this is that I'm comparing a filter a colleague wrote
to various other filters (including SpamAssassin) and I want to make
sure that the summary I give of SpamAssassin in my paper is accurate.
Neural net vs. perceptron is a large distinction in our community, so I
wouldn't want to be wrong about it.
Thanks again,
Gabriel