expose lastDocId in the posting from the TermEnum API
-----------------------------------------------------

                 Key: LUCENE-1612
                 URL: https://issues.apache.org/jira/browse/LUCENE-1612
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
    Affects Versions: 2.4
            Reporter: John Wang


We currently have on the TermEnum api: docFreq() which gives the number docs in 
the posting.
It would be good to also have the max docid in the posting. That information is 
useful when construction a custom DocIdSet, .e.g determine sparseness of the 
doc list to decide whether or not to use a BitSet.

I have written a patch to do this, the problem with it is the TermInfosWriter 
encodes values in VInt/VLong, there is very little flexibility to add in 
lastDocId while making the index backward compatible. (If simple int is used 
for say, docFreq, a bit can be used to flag reading of a new piece of 
information)

output.writeVInt(ti.docFreq);                       // write doc freq
    output.writeVLong(ti.freqPointer - lastTi.freqPointer); // write pointers
    output.writeVLong(ti.proxPointer - lastTi.proxPointer);

Anyway, patch is attached with:TestSegmentTermEnum modified to test this. 
TestBackwardsCompatibility fails due to reasons described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to