Quick correction. I remembered precomputing prior to population of the
index wouldn't work for me in this case because there wouldn't be the term
frequency data for the full corpus.
On Tuesday, March 4, 2014 11:56:04 AM UTC+2, Kevin B wrote:
>
> As background I have some Lucene based code which
As background I have some Lucene based code which is used to manipulate
index statistics to generate numeric document vectors. This code sits
between systems that need document vectors for input and Lucene indexes
that are the store of the source data & statistics (term & document
frequencies)