Hello,
We're using a sorted index in order to implement early termination
efficiently over an index of hundreds of millions of documents. As of now,
we're using the default codecs coming with Lucene 4, but we believe that
due to the fact that the docids are sorted, we should be able to do much
Do you mean you are pre-sorting the documents (by what criteria?)
yourself, before adding them to the index?
In which case... you should already be seeing some benefits (smaller
index size) than had you randomly added them (ie the vInts should
take fewer bytes), I think. (Probably the savings
Hello Michael,
Yes, we are pre-sorting the documents before adding them to the index. We
have a score associated to every document (not an IR score but a
document-related score that reflects its importance). Therefore, the
document with the biggest score will have the lowest docid (we add it
On Thu, Apr 12, 2012 at 6:35 PM, Carlos Gonzalez-Cadenas
c...@experienceon.com wrote:
Hello Michael,
Yes, we are pre-sorting the documents before adding them to the index. We
have a score associated to every document (not an IR score but a
document-related score that reflects its importance).