Hello,

We're using a sorted index in order to implement early termination
efficiently over an index of hundreds of millions of documents. As of now,
we're using the default codecs coming with Lucene 4, but we believe that
due to the fact that the docids are sorted, we should be able to do much
better in terms of storage and achieve much better performance, especially
decompression performance.

In particular, Robert Muir is commenting on these lines here:

https://issues.apache.org/jira/browse/LUCENE-2482?focusedCommentId=12982411&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12982411

We're aware that the in the bulkpostings branch there are different codecs
being implemented and different experiments being done. We don't know
whether we should implement our own codec (i.e. using some RLE-like
techniques) or we should use one of the codecs implemented there (PFOR,
Simple64, ...).

Can you please give us some advice on this?

Thanks
Carlos

Carlos Gonzalez-Cadenas
CEO, ExperienceOn - New generation search
http://www.experienceon.com

Mobile: +34 652 911 201
Skype: carlosgonzalezcadenas
LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas

Reply via email to