Re: codecs for sorted indexes

2012-04-12 Thread Robert Muir
On Thu, Apr 12, 2012 at 6:35 PM, Carlos Gonzalez-Cadenas wrote: > Hello Michael, > > Yes, we are pre-sorting the documents before adding them to the index. We > have a score associated to every document (not an IR score but a > document-related score that reflects its "importance"). Therefore, the

Re: codecs for sorted indexes

2012-04-12 Thread Carlos Gonzalez-Cadenas
Hello Michael, Yes, we are pre-sorting the documents before adding them to the index. We have a score associated to every document (not an IR score but a document-related score that reflects its "importance"). Therefore, the document with the biggest score will have the lowest docid (we add it fir

Re: codecs for sorted indexes

2012-04-12 Thread Michael McCandless
Do you mean you are pre-sorting the documents (by what criteria?) yourself, before adding them to the index? In which case... you should already be seeing some benefits (smaller index size) than had you "randomly" added them (ie the vInts should take fewer bytes), I think. (Probably the savings w

codecs for sorted indexes

2012-04-12 Thread Carlos Gonzalez-Cadenas
Hello, We're using a sorted index in order to implement early termination efficiently over an index of hundreds of millions of documents. As of now, we're using the default codecs coming with Lucene 4, but we believe that due to the fact that the docids are sorted, we should be able to do much bet