On Thu, Apr 12, 2012 at 6:35 PM, Carlos Gonzalez-Cadenas
wrote:
> Hello Michael,
>
> Yes, we are pre-sorting the documents before adding them to the index. We
> have a score associated to every document (not an IR score but a
> document-related score that reflects its "importance"). Therefore, the
Hello Michael,
Yes, we are pre-sorting the documents before adding them to the index. We
have a score associated to every document (not an IR score but a
document-related score that reflects its "importance"). Therefore, the
document with the biggest score will have the lowest docid (we add it fir
Do you mean you are pre-sorting the documents (by what criteria?)
yourself, before adding them to the index?
In which case... you should already be seeing some benefits (smaller
index size) than had you "randomly" added them (ie the vInts should
take fewer bytes), I think. (Probably the savings w
Hello,
We're using a sorted index in order to implement early termination
efficiently over an index of hundreds of millions of documents. As of now,
we're using the default codecs coming with Lucene 4, but we believe that
due to the fact that the docids are sorted, we should be able to do much
bet