Martijn van Groningen created LUCENE-7304:
---------------------------------------------

             Summary: Doc values based block join implementation
                 Key: LUCENE-7304
                 URL: https://issues.apache.org/jira/browse/LUCENE-7304
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Martijn van Groningen
            Priority: Minor


At query time the block join relies on a bitset for finding the previous parent 
doc during advancing the doc id iterator. On large indices these bitsets can 
consume large amounts of jvm heap space.  Also typically due the nature how 
these bitsets are set, the 'FixedBitSet' implementation is used.

The idea I had was to replace the bitset usage by a numeric doc values field 
that stores offsets. Each child doc stores how many docids it is from its 
parent doc and each parent stores how many docids it is apart from its first 
child. At query time this information can be used to perform the block join.

I think another benefit of this approach is that external tools can now easily 
determine if a doc is part of a block of documents and perhaps this also helps 
index time sorting?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to