I'm seeing pf and pf3 clauses fail to generate in long queries containing synonyms. Wondering if anyone else has run into this, or if it needs to be submitted as a bug in Jira. It is a showstopper problem for the current project, as the pf and pf3 were pretty heavily tuned.
Using Solr 7.1; all fields are using the following type: With query-time synonyms: <fieldType name="my_text_general" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1" protected="protwords_wdff.txt"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.TrimFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords_nostem.txt"/> <filter class="solr.KStemFilterFactory"/> <filter class="solr.FlattenGraphFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1" protected="protwords_wdff.txt"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.TrimFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> <filter class="solr.SynonymGraphFilterFactory" managed="synonyms_all" /> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords_nostem.txt"/> <filter class="solr.KStemFilterFactory"/> </analyzer> <similarity class="solr.ClassicSimilarityFactory" /> </fieldType> Without query-time synonyms: <fieldType name="my_text_general" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1" protected="protwords_wdff.txt"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.TrimFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> <filter class="solr.SynonymGraphFilterFactory" managed="synonyms_all" /> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords_nostem.txt"/> <filter class="solr.KStemFilterFactory"/> <filter class="solr.FlattenGraphFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1" protected="protwords_wdff.txt"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.TrimFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.EnglishMinimalStemFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords_nostem.txt"/> <filter class="solr.KStemFilterFactory"/> </analyzer> <similarity class="solr.ClassicSimilarityFactory" /> </fieldType> Synonyms file is pretty long, so I'll just include the relevent bits for an example: allergic, hypersensitive aspirin, acetylsalicylic acid dog, canine, canis familiris, k 9 rat, rattus The problem seems to occur when part of the query has a synonym, but the whole phrase is not. Whitespace added to piece out what is going on; believe any parentheses errors are due to my tinkering around. Beyond that though, this is as from Solr. Slop has been tinkered with to identify PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in 1, pf3 ending in 2 eg ~10, ~11, ~12, etc. ============= Example 1: "aspirin dose in rats" ============== With query-time synonyms: =============== /// Q terms generate as expected /// +((((kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 | (species:\"acetylsalicylic acid\" species:aspirin) | (keywords_bm25_no_norms:\"acetylsalicylic acid\" keywords_bm25_no_norms:aspirin)^50.0 | (description:\"acetylsalicylic acid\" description:aspirin) | (kw1ranked:\"acetylsalicylic acid\" kw1ranked:aspirin)^100.0 | (text:\"acetylsalicylic acid\" text:aspirin) | (title:\"acetylsalicylic acid\" title:aspirin)^100.0 | (keywordsranked_bm25_no_norms:\"acetylsalicylic acid\" keywordsranked_bm25_no_norms:aspirin)^50.0 | (authors:\"acetylsalicylic acid\" authors:aspirin))~0.4 ((Synonym(kw1:dosage kw1:dose kw1:dose kw1:dose))^100.0 | Synonym(species:dosage species:dose species:dose species:dose) | (Synonym(keywords_bm25_no_norms:dosage keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose))^50.0 | Synonym(description:dosage description:dose description:dose description:dose) | (Synonym(kw1ranked:dosage kw1ranked:dose kw1ranked:dose kw1ranked:dose))^100.0 | Synonym(text:dosage text:dose text:dose text:dose) | (Synonym(title:dosage title:dose title:dose title:dose))^100.0 | (Synonym(keywordsranked_bm25_no_norms:dosage keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose))^50.0 | Synonym(authors:dosage authors:dose authors:dose authors:dose))~0.4 ((Synonym(kw1:rat kw1:rattu))^100.0 | Synonym(species:rat species:rattu) | (Synonym(keywords_bm25_no_norms:rat keywords_bm25_no_norms:rattu))^50.0 | Synonym(description:rat description:rattu) | (Synonym(kw1ranked:rat kw1ranked:rattu))^100.0 | Synonym(text:rat text:rattu) | (Synonym(title:rat title:rattu))^100.0 | (Synonym(keywordsranked_bm25_no_norms:rat keywordsranked_bm25_no_norms:rattu))^50.0 | Synonym(authors:rat authors:rattu))~0.4)~3) /// PF and PF2 are missing. /// () () () () () /// This is actually PF3 with a missing ? where the stopword 'in' belonged. /// ((title:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 | (text:\"(dosage dose dose dose) (rattu rat)\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"(dosage dose dose dose) (rattu rat)\"~12)^500.0 | (kw1ranked:\"(dosage dose dose dose) (rattu rat)\"~12)^100.0 | (kw1:\"(dosage dose dose dose) (rattu rat)\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))", With index-time synonyms: =============== /// Q /// "boost(+((((kw1:aspirin)^100.0 | species:aspirin | (keywords_bm25_no_norms:aspirin)^50.0 | description:aspirin | (kw1ranked:aspirin)^100.0 | text:aspirin | (title:aspirin)^100.0 | (keywordsranked_bm25_no_norms:aspirin)^50.0 | authors:aspirin)~0.4 ((kw1:dose)^100.0 | species:dose | (keywords_bm25_no_norms:dose)^50.0 | description:dose | (kw1ranked:dose)^100.0 | text:dose | (title:dose)^100.0 | (keywordsranked_bm25_no_norms:dose)^50.0 | authors:dose)~0.4 ((kw1:rats)^100.0 | species:rats | (keywords_bm25_no_norms:rats)^50.0 | description:rats | (kw1ranked:rats)^100.0 | text:rats | (title:rats)^100.0 | (keywordsranked_bm25_no_norms:rats)^50.0 | authors:rats)~0.4)~3) /// PF /// ((title:\"aspirin dose ? rats\"~20)^5000.0 | (keywordsranked_bm25_no_norms:\"aspirin dose ? rats\"~20)^5000.0 | (keywords_bm25_no_norms:\"aspirin dose ? rats\"~20)^1500.0 | (text:\"aspirin dose ? rats\"~20)^1000.0)~0.4 ((kw1ranked:\"aspirin dose ? rats\"~10)^5000.0 | (kw1:\"aspirin dose ? rats\"~10)^500.0)~0.4 ((authors:\"aspirin dose ? rats\")^250.0 | description:\"aspirin dose ? rats\")~0.4 /// PF2 /// ((text:\"aspirin dose ? rats\"~100)^500.0)~0.4 (authors:\"aspirin dose\"~11 | species:\"aspirin dose\"~11)~0.4 /// PF3 /// (((title:\"aspirin dose\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"aspirin dose\"~22)^1000.0 | (text:\"aspirin dose\"~22)^100.0)~0.4 ((title:\"dose ? rats\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"dose ? rats\"~22)^1000.0 | (text:\"dose ? rats\"~22)^100.0)~0.4) (((keywords_bm25_no_norms:\"aspirin dose\"~12)^500.0 | (kw1ranked:\"aspirin dose\"~12)^100.0 | (kw1:\"aspirin dose\"~12)^100.0)~0.4 ((keywords_bm25_no_norms:\"dose ? rats\"~12)^500.0 | (kw1ranked:\"dose ? rats\"~12)^100.0 | (kw1:\"dose ? rats\"~12)^100.0)~0.4),product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))", =============== Example 2: "allergic reaction dogs" The underlying issue isn't specifically PF, PF2, PF3. The following example picks up PF2, but not PF or PF3 =============== With Query-time synonyms: /// Q /// parsedquery_toString":"boost( +((((Synonym(kw1:allergic kw1:allergy kw1:hypersensitive kw1:hypersensitive))^100.0 | Synonym(species:allergic species:allergy species:hypersensitive species:hypersensitive) | (Synonym(keywords_bm25_no_norms:allergic keywords_bm25_no_norms:allergy keywords_bm25_no_norms:hypersensitive keywords_bm25_no_norms:hypersensitive))^50.0 | Synonym(description:allergic description:allergy description:hypersensitive description:hypersensitive) | (Synonym(kw1ranked:allergic kw1ranked:allergy kw1ranked:hypersensitive kw1ranked:hypersensitive))^100.0 | Synonym(text:allergic text:allergy text:hypersensitive text:hypersensitive) | (Synonym(title:allergic title:allergy title:hypersensitive title:hypersensitive))^100.0 | (Synonym(keywordsranked_bm25_no_norms:allergic keywordsranked_bm25_no_norms:allergy keywordsranked_bm25_no_norms:hypersensitive keywordsranked_bm25_no_norms:hypersensitive))^50.0 | Synonym(authors:allergic authors:allergy authors:hypersensitive authors:hypersensitive))~0.4 ((kw1:reaction)^100.0 | species:reaction | (keywords_bm25_no_norms:reaction)^50.0 | description:reaction | (kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 | (keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4 ((kw1:\"cani familiari\" kw1:canine kw1:\"k 9\" kw1:\"cani lupu familiari\" kw1:dog)^100.0 | (species:\"cani familiari\" species:canine species:\"k 9\" species:\"cani lupu familiari\" species:dog) | (keywords_bm25_no_norms:\"cani familiari\" keywords_bm25_no_norms:canine keywords_bm25_no_norms:\"k 9\" keywords_bm25_no_norms:\"cani lupu familiari\" keywords_bm25_no_norms:dog)^50.0 | (description:\"cani familiari\" description:canine description:\"k 9\" description:\"cani lupu familiari\" description:dog) | (kw1ranked:\"cani familiari\" kw1ranked:canine kw1ranked:\"k 9\" kw1ranked:\"cani lupu familiari\" kw1ranked:dog)^100.0 | (text:\"cani familiari\" text:canine text:\"k 9\" text:\"cani lupu familiari\" text:dog) | (title:\"cani familiari\" title:canine title:\"k 9\" title:\"cani lupu familiari\" title:dog)^100.0 | (keywordsranked_bm25_no_norms:\"cani familiari\" keywordsranked_bm25_no_norms:canine keywordsranked_bm25_no_norms:\"k 9\" keywordsranked_bm25_no_norms:\"cani lupu familiari\" keywordsranked_bm25_no_norms:dog)^50.0 | (authors:\"cani familiari\" authors:canine authors:\"k 9\" authors:\"cani lupu familiari\" authors:dog))~0.4)~3) /// PF /// () () () () /// PF2 //// (authors:\"(hypersensitive allergy hypersensitive allergic) reaction\"~11 | species:\"(hypersensitive allergy hypersensitive allergic) reaction\"~11)~0.4 /// PF3 /// () (), product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))", With index-timy synonyms: /// Q /// +((((kw1:allergic)^100.0 | species:allergic | (keywords_bm25_no_norms:allergic)^50.0 | description:allergic | (kw1ranked:allergic)^100.0 | text:allergic | (title:allergic)^100.0 | (keywordsranked_bm25_no_norms:allergic)^50.0 | authors:allergic)~0.4 ((kw1:reaction)^100.0 | species:reaction | (keywords_bm25_no_norms:reaction)^50.0 | description:reaction | (kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 | (keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4 ((kw1:dog)^100.0 | species:dog | (keywords_bm25_no_norms:dog)^50.0 | description:dog | (kw1ranked:dog)^100.0 | text:dog | (title:dog)^100.0 | (keywordsranked_bm25_no_norms:dog)^50.0 | authors:dog)~0.4)~3) /// PF /// ((title:\"allergic reaction dog\"~20)^5000.0 | (keywordsranked_bm25_no_norms:\"allergic reaction dog\"~20)^5000.0 | (keywords_bm25_no_norms:\"allergic reaction dog\"~20)^1500.0 | (text:\"allergic reaction dog\"~20)^1000.0)~0.4 ((kw1ranked:\"allergic reaction dog\"~10)^5000.0 | (kw1:\"allergic reaction dog\"~10)^500.0)~0.4 ((authors:\"allergic reaction dog\")^250.0 | description:\"allergic reaction dog\")~0.4 ((text:\"allergic reaction dog\"~100)^500.0)~0.4 /// PF2 /// ((authors:\"allergic reaction\"~11 | species:\"allergic reaction\"~11)~0.4 /// PF3 /// (authors:\"reaction dog\"~11 | species:\"reaction dog\"~11)~0.4) ((title:\"allergic reaction dog\"~22)^1000.0 | (keywordsranked_bm25_no_norms:\"allergic reaction dog\"~22)^1000.0 | (text:\"allergic reaction dog\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"allergic reaction dog\"~12)^500.0 | (kw1ranked:\"allergic reaction dog\"~12)^100.0 | (kw1:\"allergic reaction dog\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))", Working on getting this rigged up in the debugger, but would appreciate any feedback. Thank you, Elizabeth