Parameters, no. You could use a PatternReplaceCharFilterFactory. NOTE: *FilterFactory are _not_ what you want in this case, they are applied to individual tokens after parsing
*CharFiterFactory are invoked on the entire input to the field, although I can’t say for certain that even that’s early enough. There are two other options to consider: StatelessScriptUpdateProcessor FieldMutatingUpdateProcessor Stateless... is probably easiest… Best, ERick > On Nov 24, 2020, at 1:44 PM, Samuel Gutierrez > <samuel.gutier...@iherb.com.INVALID> wrote: > > Are there any good workarounds/parameters we can use to fix this so it > doesn't have to be solved client side? > > On Tue, Nov 24, 2020 at 7:50 AM matthew sporleder <msporle...@gmail.com> > wrote: > >> Is the normal/standard solution here to regex remove the '-'s and >> combine them into a single token? >> >> On Tue, Nov 24, 2020 at 8:00 AM Erick Erickson <erickerick...@gmail.com> >> wrote: >>> >>> This is a common point of confusion. There are two phases for creating a >> query, >>> query _parsing_ first, then the analysis chain for the parsed result. >>> >>> So what e-dismax sees in the two cases is: >>> >>> Name_enUS:“high tech” -> two tokens, since there are two of them pf2 >> comes into play. >>> >>> Name_enUS:“high-tech” -> there’s only one token so pf2 doesn’t apply, >> splitting it on the hyphen comes later. >>> >>> It’s especially confusing since the field analysis then breaks up >> “high-tech” into two tokens that >>> look the same as “high tech” in the debug response, just without the >> phrase query. >>> >>> Name_enUS:high >>> Name_enUS:tech >>> >>> Best, >>> Erick >>> >>>> On Nov 23, 2020, at 8:32 PM, Samuel Gutierrez < >> samuel.gutier...@iherb.com.INVALID> wrote: >>>> >>>> I am troubleshooting an issue with ranking for search terms that >> contain a >>>> "-" vs the same query that does not contain the dash e.g. "high-tech" >> vs >>>> "high tech". The field that I am querying is using the standard >> tokenizer, >>>> so I would expect that the underlying lucene query should be the same >> for >>>> both versions of the query, however when printing the debug, it appears >>>> they are generated differently. I know "-" must be escaped as it has >>>> special meaning in lucene, however escaping does not fix the problem. >> It >>>> appears that with the "-" present, the pf2 edismax parameter is not >>>> respected and omitted from the final query. We use sow=false as we have >>>> multiterm synonyms and need to ensure they are included in the final >> lucene >>>> query. My expectation is that the final underlying lucene query should >> be >>>> based on the output of the field analyzer, however after briefly >> looking >>>> at the code for ExtendedDismaxQParser, it appears that there is some >> string >>>> processing happening outside of the analysis step which causes the >>>> unexpected lucene query. >>>> >>>> >>>> Solr Debug for "high tech": >>>> >>>> parsedquery: "+(DisjunctionMaxQuery((Name_enUS:high)~0.4) >>>> DisjunctionMaxQuery((Name_enUS:tech)~0.4))~2 >>>> DisjunctionMaxQuery((Name_enUS:"high tech"~5)~0.4) >>>> DisjunctionMaxQuery((Name_enUS:"high tech"~4)~0.4)", >>>> parsedquery_toString: "+(((Name_enUS:high)~0.4 >>>> (Name_enUS:tech)~0.4)~2) (Name_enUS:"high tech"~5)~0.4 >>>> (Name_enUS:"high tech"~4)~0.4", >>>> >>>> >>>> Solr Debug for "high-tech" >>>> >>>> parsedquery: "+DisjunctionMaxQuery((((Name_enUS:high >>>> Name_enUS:tech)~2))~0.4) DisjunctionMaxQuery((Name_enUS:"high >>>> tech"~5)~0.4)", >>>> parsedquery_toString: "+(((Name_enUS:high Name_enUS:tech)~2))~0.4 >>>> (Name_enUS:"high tech"~5)~0.4" >>>> >>>> SolrConfig: >>>> >>>> <requestHandler name="/search" class="solr.SearchHandler"> >>>> <lst name="defaults"> >>>> <str name="omitHeader">true</str> >>>> <str name="indent">true</str> >>>> <str name="wt">json</str> >>>> <str name="mm">3<75%</str> >>>> <str name="qf">Name_enUS</str> >>>> <str name="pf">Name_enUS</str> >>>> <str name="ps">5</str> <!----> >>>> <str name="pf2">Name_enUS</str> >>>> <str name="ps2">4</str> <!----> >>>> <str name="qs">3</str> <!----> >>>> <str name="tie">0.4</str> >>>> <str name="echoParams">explicit</str> >>>> <int name="rows">100</int> >>>> <str name="sow">false</str> >>>> </lst> >>>> <lst name="invariants"> >>>> <str name="defType">edismax</str> >>>> </lst> >>>> </requestHandler> >>>> >>>> Schema: >>>> >>>> <fieldType name="text_en" class="solr.TextField" >> positionIncrementGap="100"> >>>> <analyzer> >>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> <filter class="solr.EnglishPossessiveFilterFactory"/> >>>> <filter class="solr.SnowballPorterFilterFactory"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> >>>> Using Solr 8.6.3 >>>> >> > > -- > *The information contained in this message is the sole and exclusive > property of ***iHerb Inc.*** and may be privileged and confidential. It may > not be disseminated or distributed to persons or entities other than the > ones intended without the written authority of ***iHerb Inc.** *If you have > received this e-mail in error or are not the intended recipient, you may > not use, copy, disseminate or distribute it. Do not open any attachments. > Please delete it immediately from your system and notify the sender > promptly by e-mail that you have done so.*