I've never implemented LSI.  Is there a way to incrementally build the model 
(by simply indexing documents) or is it something that one only runs after the 
fact once one has built up the much bigger matrix?  If it's the former, I bet 
it wouldn't be that hard to just implement the appropriate new codecs and 
similarity, assuming Lucene trunk.  If it's the latter, then Ted's comment 
about pushing back into Lucene gets a bit hairier.  Still, I wonder if the 
Codecs/Similarity could help here, too.

What's a typical workflow look like for building all of this?

On Nov 13, 2011, at 3:58 PM, Ted Dunning wrote:

> Essentially not.
> 
> And I would worry about how to push the LSI vectors back into lucene in a
> coherent and usable way.
> 
> On Sun, Nov 13, 2011 at 10:47 AM, Sebastian Schelter <[email protected]> wrote:
> 
>> Is there some documentation/tutorial available on how to build a LSI
>> pipeline with mahout and lucene?
>> 
>> --sebastian
>> 


Reply via email to