I am troubleshooting an issue with ranking for search terms that contain a "-" vs the same query that does not contain the dash e.g. "high-tech" vs "high tech". The field that I am querying is using the standard tokenizer, so I would expect that the underlying lucene query should be the same for both versions of the query, however when printing the debug, it appears they are generated differently. I know "-" must be escaped as it has special meaning in lucene, however escaping does not fix the problem. It appears that with the "-" present, the pf2 edismax parameter is not respected and omitted from the final query. We use sow=false as we have multiterm synonyms and need to ensure they are included in the final lucene query. My expectation is that the final underlying lucene query should be based on the output of the field analyzer, however after briefly looking at the code for ExtendedDismaxQParser, it appears that there is some string processing happening outside of the analysis step which causes the unexpected lucene query.
Solr Debug for "high tech": parsedquery: "+(DisjunctionMaxQuery((Name_enUS:high)~0.4) DisjunctionMaxQuery((Name_enUS:tech)~0.4))~2 DisjunctionMaxQuery((Name_enUS:"high tech"~5)~0.4) DisjunctionMaxQuery((Name_enUS:"high tech"~4)~0.4)", parsedquery_toString: "+(((Name_enUS:high)~0.4 (Name_enUS:tech)~0.4)~2) (Name_enUS:"high tech"~5)~0.4 (Name_enUS:"high tech"~4)~0.4", Solr Debug for "high-tech" parsedquery: "+DisjunctionMaxQuery((((Name_enUS:high Name_enUS:tech)~2))~0.4) DisjunctionMaxQuery((Name_enUS:"high tech"~5)~0.4)", parsedquery_toString: "+(((Name_enUS:high Name_enUS:tech)~2))~0.4 (Name_enUS:"high tech"~5)~0.4" SolrConfig: <requestHandler name="/search" class="solr.SearchHandler"> <lst name="defaults"> <str name="omitHeader">true</str> <str name="indent">true</str> <str name="wt">json</str> <str name="mm">3<75%</str> <str name="qf">Name_enUS</str> <str name="pf">Name_enUS</str> <str name="ps">5</str> <!----> <str name="pf2">Name_enUS</str> <str name="ps2">4</str> <!----> <str name="qs">3</str> <!----> <str name="tie">0.4</str> <str name="echoParams">explicit</str> <int name="rows">100</int> <str name="sow">false</str> </lst> <lst name="invariants"> <str name="defType">edismax</str> </lst> </requestHandler> Schema: <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory"/> </analyzer> </fieldType> Using Solr 8.6.3 -- *The information contained in this message is the sole and exclusive property of ***iHerb Inc.*** and may be privileged and confidential. It may not be disseminated or distributed to persons or entities other than the ones intended without the written authority of ***iHerb Inc.** *If you have received this e-mail in error or are not the intended recipient, you may not use, copy, disseminate or distribute it. Do not open any attachments. Please delete it immediately from your system and notify the sender promptly by e-mail that you have done so.*