Hello, We're using a sorted index in order to implement early termination efficiently over an index of hundreds of millions of documents. As of now, we're using the default codecs coming with Lucene 4, but we believe that due to the fact that the docids are sorted, we should be able to do much better in terms of storage and achieve much better performance, especially decompression performance.
In particular, Robert Muir is commenting on these lines here: https://issues.apache.org/jira/browse/LUCENE-2482?focusedCommentId=12982411&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12982411 We're aware that the in the bulkpostings branch there are different codecs being implemented and different experiments being done. We don't know whether we should implement our own codec (i.e. using some RLE-like techniques) or we should use one of the codecs implemented there (PFOR, Simple64, ...). Can you please give us some advice on this? Thanks Carlos Carlos Gonzalez-Cadenas CEO, ExperienceOn - New generation search http://www.experienceon.com Mobile: +34 652 911 201 Skype: carlosgonzalezcadenas LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas