Exactly.

One addition detail, this format that the Bayes classifier want is pretty
easy to generate from a Lucene term vector.

It is probably a good idea to experiment with emitting multiple copies of
repeated terms.

On Tue, Apr 5, 2011 at 2:10 PM, Daniel McEnnis <[email protected]> wrote:

> Its actually not text to classify for the Bayes classifier but
> tokenized words.  No punctuation and tokens separated by a space. One
> file per line with the classification starting every line.  I hope
> this helps...
>

Reply via email to