codecs for sorted indexes

2012-04-12 Thread Carlos Gonzalez-Cadenas
Hello, We're using a sorted index in order to implement early termination efficiently over an index of hundreds of millions of documents. As of now, we're using the default codecs coming with Lucene 4, but we believe that due to the fact that the docids are sorted, we should be able to do much

Re: codecs for sorted indexes

2012-04-12 Thread Michael McCandless
Do you mean you are pre-sorting the documents (by what criteria?) yourself, before adding them to the index? In which case... you should already be seeing some benefits (smaller index size) than had you randomly added them (ie the vInts should take fewer bytes), I think. (Probably the savings

Re: codecs for sorted indexes

2012-04-12 Thread Carlos Gonzalez-Cadenas
Hello Michael, Yes, we are pre-sorting the documents before adding them to the index. We have a score associated to every document (not an IR score but a document-related score that reflects its importance). Therefore, the document with the biggest score will have the lowest docid (we add it

Re: codecs for sorted indexes

2012-04-12 Thread Robert Muir
On Thu, Apr 12, 2012 at 6:35 PM, Carlos Gonzalez-Cadenas c...@experienceon.com wrote: Hello Michael, Yes, we are pre-sorting the documents before adding them to the index. We have a score associated to every document (not an IR score but a document-related score that reflects its importance).