Highlighting and maxBooleanClauses limit

Ken Stanley Tue, 02 Nov 2010 07:15:15 -0700

By default, the solrconfig.xml has maxBooleanClauses set to 1024, which in
my opinion should be more than enough clauses in general. Recently, we have
been noticing errors in our Catalina log: SEVERE:
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set
to 2048. As a temporary (and quick) work around, we tried to increase the
maxBooleanClauses to 2048, but are still experiencing problems hitting the
limit. The full error (including the query ran before the error) is:


INFO: [bizjournals] webapp=/solr path=/select/
params={facet=true&sort=df_date_published+asc&hl=true&version=2.2&facet.field=facet_type&facet.field=facet_author&facet.field=facet_arr_industries&fq=df_date_published:[*+TO+NOW]&hl.requireFieldMatch=true&hl.fragsize=75&facet.mincount=1&indent=on&hl.fl=df_text_content&wt=xml&rows=25&hl.snippets=2&hl.maxAlternateFieldLength=150&start=0&q=(df_text_blog_name:"farm+bill")+OR+((df_text_headline:[*+TO+*]+AND+df_date_published:[*+TO+NOW])+AND+((df_text_author:"farm+bill")+OR+(df_text_content:"farm+bill")+OR+(df_text_headline:"farm+bill")+OR+(df_text_blog_name:"farm+bill")))&hl.alternateField=df_text_content&hl.usePhraseHighlighter=true}
hits=269 status=500 QTime=729
Nov 2, 2010 4:10:09 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount
is set to 2048
        at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:153)
        at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:144)
        at
org.apache.lucene.search.MultiTermQuery$ScoringBooleanQueryRewrite.rewrite(MultiTermQuery.java:110)
        at
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:382)
        at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:178)
        at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111)
        at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111)
        at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111)
        at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
        at
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
        at
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
        at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
        at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
        at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
        at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
        at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
        at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
        at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:619)

I've noticed in the stack trace that this exception occurs when trying to
build the query for the highlighting; I've confirmed this by copying the
params and changing hl=true to hl=false. Unfortunately, when using
debugQuery=on, I do not see any details on what is going on with the
highlighting portion of the query (after artificially increasing the
maxBooleanClauses so the query will run).

With all of that said, my question(s) to the list are: Is there a way to
determine how exactly the highlighter is building its query (i.e., some sort
of highlighting debug setting)? Is the behavior of highlighting in SOLR
intended to be held to the same restrictions (maxBooleanClauses) as the
query parser (even though the highlighting query is built internally)?

I am not a SOLR expert by any measure of the word, and as such, I just don't
understand how two words on one field (as noted by the use of
hl.fl=df_text_content + hl.requireFieldMatch=true +
hl.usePhraseHighlighter=true) could somehow exceed the limits of both 1024
and 2048. I am concerned that even if I continue increasing
maxBooleanClauses, I am not actually solving anything; in fact, my concern
is that if I were to keep increasing this limit, I am in fact begging for
problems later on down the road.

For the sake of completeness, here are the definitions of the field I'm
highlighting on (schema.xml):

        <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitO
nCaseChange="1" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true" />
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitO
nCaseChange="1" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt" />
            </analyzer>
        </fieldType>

        <dynamicField name="df_text_*" type="text" indexed="true"
stored="true" />

    <solrQueryParser defaultOperator="OR" />

And here is my highlighter definition (solrconfig.xml):

    <highlighting>
        <!-- Configure the standard fragmenter -->
        <!-- This could most likely be commented out in the "default" case
-->
        <fragmenter name="gap"
class="org.apache.solr.highlight.GapFragmenter" default="true">
            <lst name="defaults">
                <int name="hl.fragsize">255</int>
            </lst>
        </fragmenter>

        <!-- A regular-expression-based fragmenter (f.i., for sentence
extraction) -->
        <fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter">
            <lst name="defaults">
                <!-- slightly smaller fragsizes work better because of slop
-->
                <int name="hl.fragsize">70</int>
                <!-- allow 50% slop on fragment sizes -->
                <float name="hl.regex.slop">0.5</float>
                <!-- a basic sentence pattern -->
                  <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
            </lst>
        </fragmenter>

        <!-- Configure the standard formatter -->
        <formatter name="html"
class="org.apache.solr.highlight.HtmlFormatter" default="true">
            <lst name="defaults">
                <str name="hl.simple.pre"><![CDATA[<em>]]></str>
                <str name="hl.simple.post"><![CDATA[</em>]]></str>
            </lst>
        </formatter>
    </highlighting>

It is worth noting that I have not done anything (except formatting) to the
highlighting configuration in solrconfig.xml. Any help, assistance, and/or
guidance that can be provided would be greatly appreciated.

Thank you,

Ken Stanley

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
                -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"

Highlighting and maxBooleanClauses limit

Reply via email to