On Sun, Nov 13, 2011 at 10:09 PM, Ted Dunning <[email protected]> wrote:
> That handles coherent. > > IT doesn't handle usable. > > Storing the vectors as binary payloads handles the situation for > projection-like applications, but that doesn't help retrieval. > It's not just projection, it's for added relevance: if you are already doing Lucene for your scoring needs, you already are getting some good precision and recall. The idea is this: you take results you are *already* scoring, and add to that scoring function an LSI cosine as one feature among many. Hopefully it will improve precision, even if it will do nothing for recall (as it's only being applied to results already retrieved by the text query). Alternatively, to improve recall, at index-time, supplement each document by terms in a new field "lsi_expanded" which are the terms closest in the SVD projected space to the document, but aren't already in it. Then at query time, add an "... OR lsi_expanded:<query>" clause onto your query. Instant query-expansion for recall enhancement. Or do both, and play with both your precision and recall. -jake
