Re: Classification with data from Lucene

2011-04-05 Thread Ted Dunning
Exactly. One addition detail, this format that the Bayes classifier want is pretty easy to generate from a Lucene term vector. It is probably a good idea to experiment with emitting multiple copies of repeated terms. On Tue, Apr 5, 2011 at 2:10 PM, Daniel McEnnis wrote: > Its actually not text

Re: Classification with data from Lucene

2011-04-05 Thread Daniel McEnnis
: Tuesday, April 05, 2011 3:19 PM > To: user@mahout.apache.org > Subject: Re: Classification with data from Lucene > > The Lucene intake does not support searches on the index. > > If you can make a copies of the index, here's a trick: delete the > documents you don't

RE: Classification with data from Lucene

2011-04-05 Thread David Croley
o dump the words and frequencies from the index, add a label, and modify the BayesFeatureDriver class to take my input. David -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Tuesday, April 05, 2011 3:19 PM To: user@mahout.apache.org Subject: Re: Classification

Re: Classification with data from Lucene

2011-04-05 Thread Lance Norskog
The Lucene intake does not support searches on the index. If you can make a copies of the index, here's a trick: delete the documents you don't want, then optimize the index. You will need a Lucene program to do this. Use this to separate the big index into training and test indexes. On Mon, Apr

Classification with data from Lucene

2011-04-04 Thread David Croley
I have a large Lucene index (with TermFreq vectors). I do not have easy access to the original source docs that the index was made from. I have identified a set of docs in the index as Category X. Is there a way to run Mahout's Bayesian classification algorithm, trained on the docs in Category X, o