This sounds really interesting. Is there a way to dump certain fields from a Lucene index to text files?
If so, I could use Lucene to do the parsing, and then seqdirectory and seq2sparse to generate Mahout vectors out of these files. Thanks, Julian 2011/5/3 Jake Mannix <jake.man...@gmail.com> > On Tue, May 3, 2011 at 6:17 PM, Grant Ingersoll <gsing...@apache.org> > wrote: > > > > > > Although technically, we could add the capability to take a Store.YES > > field > > > and re-tokenize and > > > build vectors from this as well. > > > > True, or we could just dump stored fields out to text and use the > existing > > text converter > > > That would probably be the right way to do that, actually. > > -jake >