-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Marty,
I've looked over the interface document quickly. This looks basically
fine. I have to admit that I'm unlikely to actually write a classifier
or tokenizer (because in a couple of weeks I'm moving from one end of
the country to the other) so I haven't given it the closest combing
that I might. One thing though. You talk about a Classification being
a string category and a % probability of the mail falling into that
category. However, you don't define which categories are valid or what
their meaning is. If the core TarProxy software (as opposed to a
plugin) is to use the categories, you'll need to define which ones are
valid along with semantics for them. Also, I wonder if an array of
Classifications should be returned. That way, classify(Token token)
would return e.g. {{"Spam", 0.65}, {"Clean", 0.40}, {"Worm", 0.05}}
with an entry for each possible classification. Otherwise, how would
it know which classification to return? Alternatively, you could
simply make this a Spam classifier (rather then trying
hyper-generality) in which case the return from classify(Token token)
is just the probability of the message so far being a spam.
Cheers,
Andrew
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>
iQA/AwUBPmeB+wXqoqbqowOrEQJ9YwCgm09rvGs3pgbIWgTxB12d71A4KSwAoNkU
yp2Foz8Jx5XzsD0UGg3s6YPE
=E6vk
-----END PGP SIGNATURE-----