[ 
https://issues.apache.org/jira/browse/LUCENE-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125350#comment-13125350
 ] 

Doug Cutting commented on LUCENE-3510:
--------------------------------------

Using a single bit to track prohibited terms seems reasonable, plus a count for 
required terms.

I don't recall the exact history of the original implementation.  I think it 
may have been in order to support more complex boolean expressions.  Any 
boolean expression can be rewritten to disjunctive normal form, which can then 
be evaluated with a set of required/prohibited mask pairs, one per conjunctive 
clause.  This is something I'd implemented previously and probably had in mind 
when implementing BooleanScorer.  A Lucene boolean query is effectively a 
single such conjunctive clause, since the optional terms can be ignored when 
evaluating the boolean expression, so would reduce to a single pair of masks.  
But, as you observe, this single clause DNF case can be further simplified to a 
boolean and a count of required terms.  Does that make sense?
                
> BooleanScorer should not limit number of prohibited clauses
> -----------------------------------------------------------
>
>                 Key: LUCENE-3510
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3510
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.5, 4.0
>
>         Attachments: LUCENE-3510.patch
>
>
> Today it's limited to 32, because it uses a separate bit in the mask
> for each clause.
> But I don't understand why it does this; I think all prohibited
> clauses can share a single boolean/bit?  Any match on a prohibited
> clause sets this bit and the doc is not collected; we don't need each
> prohibited clause to have a dedicated bit?
> We also use the mask for required clauses, but this code is now
> commented out (we always use BS2 if there are any required clauses);
> if we re-enable this code (and I think we should, at least in certain
> cases: I suspect it'd be faster than BS2 in many cases), I think we
> can cutover to an int count instead of bit masks, and then have no
> limit on the required clauses sent to BooleanScorer also.
> Separately I cleaned a few things up about BooleanScorer: all of the
> embedded scorer methods (nextDoc, docID, advance, score) now throw
> UOE; pre-allocate the buckets instead of doing it lazily
> per-sub-collect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to