By default, the solrconfig.xml has maxBooleanClauses set to 1024, which in my opinion should be more than enough clauses in general. Recently, we have been noticing errors in our Catalina log: SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 2048. As a temporary (and quick) work around, we tried to increase the maxBooleanClauses to 2048, but are still experiencing problems hitting the limit. The full error (including the query ran before the error) is:
INFO: [bizjournals] webapp=/solr path=/select/ params={facet=true&sort=df_date_published+asc&hl=true&version=2.2&facet.field=facet_type&facet.field=facet_author&facet.field=facet_arr_industries&fq=df_date_published:[*+TO+NOW]&hl.requireFieldMatch=true&hl.fragsize=75&facet.mincount=1&indent=on&hl.fl=df_text_content&wt=xml&rows=25&hl.snippets=2&hl.maxAlternateFieldLength=150&start=0&q=(df_text_blog_name:"farm+bill")+OR+((df_text_headline:[*+TO+*]+AND+df_date_published:[*+TO+NOW])+AND+((df_text_author:"farm+bill")+OR+(df_text_content:"farm+bill")+OR+(df_text_headline:"farm+bill")+OR+(df_text_blog_name:"farm+bill")))&hl.alternateField=df_text_content&hl.usePhraseHighlighter=true} hits=269 status=500 QTime=729 Nov 2, 2010 4:10:09 AM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 2048 at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:153) at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:144) at org.apache.lucene.search.MultiTermQuery$ScoringBooleanQueryRewrite.rewrite(MultiTermQuery.java:110) at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:382) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:178) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414) at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216) at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:619) I've noticed in the stack trace that this exception occurs when trying to build the query for the highlighting; I've confirmed this by copying the params and changing hl=true to hl=false. Unfortunately, when using debugQuery=on, I do not see any details on what is going on with the highlighting portion of the query (after artificially increasing the maxBooleanClauses so the query will run). With all of that said, my question(s) to the list are: Is there a way to determine how exactly the highlighter is building its query (i.e., some sort of highlighting debug setting)? Is the behavior of highlighting in SOLR intended to be held to the same restrictions (maxBooleanClauses) as the query parser (even though the highlighting query is built internally)? I am not a SOLR expert by any measure of the word, and as such, I just don't understand how two words on one field (as noted by the use of hl.fl=df_text_content + hl.requireFieldMatch=true + hl.usePhraseHighlighter=true) could somehow exceed the limits of both 1024 and 2048. I am concerned that even if I continue increasing maxBooleanClauses, I am not actually solving anything; in fact, my concern is that if I were to keep increasing this limit, I am in fact begging for problems later on down the road. For the sake of completeness, here are the definitions of the field I'm highlighting on (schema.xml): <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitO nCaseChange="1" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitO nCaseChange="1" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt" /> </analyzer> </fieldType> <dynamicField name="df_text_*" type="text" indexed="true" stored="true" /> <solrQueryParser defaultOperator="OR" /> And here is my highlighter definition (solrconfig.xml): <highlighting> <!-- Configure the standard fragmenter --> <!-- This could most likely be commented out in the "default" case --> <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter" default="true"> <lst name="defaults"> <int name="hl.fragsize">255</int> </lst> </fragmenter> <!-- A regular-expression-based fragmenter (f.i., for sentence extraction) --> <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter"> <lst name="defaults"> <!-- slightly smaller fragsizes work better because of slop --> <int name="hl.fragsize">70</int> <!-- allow 50% slop on fragment sizes --> <float name="hl.regex.slop">0.5</float> <!-- a basic sentence pattern --> <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str> </lst> </fragmenter> <!-- Configure the standard formatter --> <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true"> <lst name="defaults"> <str name="hl.simple.pre"><![CDATA[<em>]]></str> <str name="hl.simple.post"><![CDATA[</em>]]></str> </lst> </formatter> </highlighting> It is worth noting that I have not done anything (except formatting) to the highlighting configuration in solrconfig.xml. Any help, assistance, and/or guidance that can be provided would be greatly appreciated. Thank you, Ken Stanley It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"