I'm seeing pf and pf3 clauses fail to generate in long queries containing
synonyms.  Wondering if anyone else has run into this, or if it needs to be
submitted as a bug in Jira.   It is a showstopper problem for the current
project, as the pf and pf3 were pretty heavily tuned.

Using Solr 7.1; all fields are using the following type:

With query-time synonyms:
<fieldType name="my_text_general" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1"
 protected="protwords_wdff.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords_nostem.txt"/>
<filter class="solr.KStemFilterFactory"/>
<filter class="solr.FlattenGraphFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1"
 protected="protwords_wdff.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
    <filter class="solr.SynonymGraphFilterFactory"  managed="synonyms_all"
/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords_nostem.txt"/>
<filter class="solr.KStemFilterFactory"/>
</analyzer>
<similarity class="solr.ClassicSimilarityFactory" />
</fieldType>

Without query-time synonyms:
<fieldType name="my_text_general" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1"
 protected="protwords_wdff.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
    <filter class="solr.SynonymGraphFilterFactory"  managed="synonyms_all"
/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords_nostem.txt"/>
<filter class="solr.KStemFilterFactory"/>
<filter class="solr.FlattenGraphFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(?i)\b(anti|hypo|hyper|non)[-\\/ ](\w+)\b" replacement="$1$2"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0" stemEnglishPossessive="1"
 protected="protwords_wdff.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords_nostem.txt"/>
<filter class="solr.KStemFilterFactory"/>
</analyzer>
<similarity class="solr.ClassicSimilarityFactory" />
</fieldType>

Synonyms file is pretty long, so I'll just include the relevent bits for an
example:

allergic, hypersensitive
aspirin, acetylsalicylic acid
dog, canine, canis familiris, k 9
rat, rattus


The problem seems to occur when part of the query has a synonym, but the
whole phrase is not.  Whitespace added to piece out what is going on;
believe any parentheses errors are due to my tinkering around.  Beyond that
though, this is as from Solr.  Slop has been tinkered with to identify
PF/PF2/PF3 clauses where PF fields have a slop ending in 0, pf2 ending in
1, pf3 ending in 2 eg ~10, ~11, ~12, etc.

=============
Example 1:  "aspirin dose in rats"
==============

With query-time synonyms:
===============
/// Q terms generate as expected ///
+((((kw1:\"acetylsalicylic acid\" kw1:aspirin)^100.0 |
(species:\"acetylsalicylic acid\" species:aspirin) |
(keywords_bm25_no_norms:\"acetylsalicylic acid\"
keywords_bm25_no_norms:aspirin)^50.0 | (description:\"acetylsalicylic
acid\" description:aspirin) | (kw1ranked:\"acetylsalicylic acid\"
kw1ranked:aspirin)^100.0 | (text:\"acetylsalicylic acid\" text:aspirin) |
(title:\"acetylsalicylic acid\" title:aspirin)^100.0 |
(keywordsranked_bm25_no_norms:\"acetylsalicylic acid\"
keywordsranked_bm25_no_norms:aspirin)^50.0 | (authors:\"acetylsalicylic
acid\" authors:aspirin))~0.4 ((Synonym(kw1:dosage kw1:dose kw1:dose
kw1:dose))^100.0 | Synonym(species:dosage species:dose species:dose
species:dose) | (Synonym(keywords_bm25_no_norms:dosage
keywords_bm25_no_norms:dose keywords_bm25_no_norms:dose
keywords_bm25_no_norms:dose))^50.0 | Synonym(description:dosage
description:dose description:dose description:dose) |
(Synonym(kw1ranked:dosage kw1ranked:dose kw1ranked:dose
kw1ranked:dose))^100.0 | Synonym(text:dosage text:dose text:dose text:dose)
| (Synonym(title:dosage title:dose title:dose title:dose))^100.0 |
(Synonym(keywordsranked_bm25_no_norms:dosage
keywordsranked_bm25_no_norms:dose keywordsranked_bm25_no_norms:dose
keywordsranked_bm25_no_norms:dose))^50.0 | Synonym(authors:dosage
authors:dose authors:dose authors:dose))~0.4 ((Synonym(kw1:rat
kw1:rattu))^100.0 | Synonym(species:rat species:rattu) |
(Synonym(keywords_bm25_no_norms:rat keywords_bm25_no_norms:rattu))^50.0 |
Synonym(description:rat description:rattu) | (Synonym(kw1ranked:rat
kw1ranked:rattu))^100.0 | Synonym(text:rat text:rattu) | (Synonym(title:rat
title:rattu))^100.0 | (Synonym(keywordsranked_bm25_no_norms:rat
keywordsranked_bm25_no_norms:rattu))^50.0 | Synonym(authors:rat
authors:rattu))~0.4)~3)

/// PF and PF2 are missing. ///
 () () () () ()

/// This is actually PF3 with a missing ? where the stopword 'in' belonged.
///
 ((title:\"(dosage dose dose dose) (rattu rat)\"~22)^1000.0 |
(keywordsranked_bm25_no_norms:\"(dosage dose dose dose) (rattu
rat)\"~22)^1000.0 | (text:\"(dosage dose dose dose) (rattu
rat)\"~22)^100.0)~0.4 ((keywords_bm25_no_norms:\"(dosage dose dose dose)
(rattu rat)\"~12)^500.0 | (kw1ranked:\"(dosage dose dose dose) (rattu
rat)\"~12)^100.0 | (kw1:\"(dosage dose dose dose) (rattu
rat)\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",

With index-time synonyms:
===============

/// Q ///
 "boost(+((((kw1:aspirin)^100.0 | species:aspirin |
(keywords_bm25_no_norms:aspirin)^50.0 | description:aspirin |
(kw1ranked:aspirin)^100.0 | text:aspirin | (title:aspirin)^100.0 |
(keywordsranked_bm25_no_norms:aspirin)^50.0 | authors:aspirin)~0.4
((kw1:dose)^100.0 | species:dose | (keywords_bm25_no_norms:dose)^50.0 |
description:dose | (kw1ranked:dose)^100.0 | text:dose | (title:dose)^100.0
| (keywordsranked_bm25_no_norms:dose)^50.0 | authors:dose)~0.4
((kw1:rats)^100.0 | species:rats | (keywords_bm25_no_norms:rats)^50.0 |
description:rats | (kw1ranked:rats)^100.0 | text:rats | (title:rats)^100.0
| (keywordsranked_bm25_no_norms:rats)^50.0 | authors:rats)~0.4)~3)
/// PF  ///
  ((title:\"aspirin dose ? rats\"~20)^5000.0 |
(keywordsranked_bm25_no_norms:\"aspirin dose ? rats\"~20)^5000.0 |
(keywords_bm25_no_norms:\"aspirin dose ? rats\"~20)^1500.0 |
(text:\"aspirin dose ? rats\"~20)^1000.0)~0.4 ((kw1ranked:\"aspirin dose ?
rats\"~10)^5000.0 | (kw1:\"aspirin dose ? rats\"~10)^500.0)~0.4
((authors:\"aspirin dose ? rats\")^250.0 | description:\"aspirin dose ?
rats\")~0.4

/// PF2 ///
  ((text:\"aspirin dose ? rats\"~100)^500.0)~0.4 (authors:\"aspirin
dose\"~11 | species:\"aspirin dose\"~11)~0.4

/// PF3 ///
(((title:\"aspirin dose\"~22)^1000.0 |
(keywordsranked_bm25_no_norms:\"aspirin dose\"~22)^1000.0 | (text:\"aspirin
dose\"~22)^100.0)~0.4 ((title:\"dose ? rats\"~22)^1000.0 |
(keywordsranked_bm25_no_norms:\"dose ? rats\"~22)^1000.0 | (text:\"dose ?
rats\"~22)^100.0)~0.4) (((keywords_bm25_no_norms:\"aspirin dose\"~12)^500.0
| (kw1ranked:\"aspirin dose\"~12)^100.0 | (kw1:\"aspirin
dose\"~12)^100.0)~0.4 ((keywords_bm25_no_norms:\"dose ? rats\"~12)^500.0 |
(kw1ranked:\"dose ? rats\"~12)^100.0 | (kw1:\"dose ?
rats\"~12)^100.0)~0.4),product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",


===============
Example 2: "allergic reaction dogs"
The underlying issue isn't specifically PF, PF2, PF3. The following example
picks up PF2, but not PF or PF3
===============

With Query-time synonyms:
///  Q ///
parsedquery_toString":"boost(
+((((Synonym(kw1:allergic kw1:allergy kw1:hypersensitive
kw1:hypersensitive))^100.0 | Synonym(species:allergic species:allergy
species:hypersensitive species:hypersensitive) |
(Synonym(keywords_bm25_no_norms:allergic keywords_bm25_no_norms:allergy
keywords_bm25_no_norms:hypersensitive
keywords_bm25_no_norms:hypersensitive))^50.0 | Synonym(description:allergic
description:allergy description:hypersensitive description:hypersensitive)
| (Synonym(kw1ranked:allergic kw1ranked:allergy kw1ranked:hypersensitive
kw1ranked:hypersensitive))^100.0 | Synonym(text:allergic text:allergy
text:hypersensitive text:hypersensitive) | (Synonym(title:allergic
title:allergy title:hypersensitive title:hypersensitive))^100.0 |
(Synonym(keywordsranked_bm25_no_norms:allergic
keywordsranked_bm25_no_norms:allergy
keywordsranked_bm25_no_norms:hypersensitive
keywordsranked_bm25_no_norms:hypersensitive))^50.0 |
Synonym(authors:allergic authors:allergy authors:hypersensitive
authors:hypersensitive))~0.4 ((kw1:reaction)^100.0 | species:reaction |
(keywords_bm25_no_norms:reaction)^50.0 | description:reaction |
(kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 |
(keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4
((kw1:\"cani familiari\" kw1:canine kw1:\"k 9\" kw1:\"cani lupu familiari\"
kw1:dog)^100.0 | (species:\"cani familiari\" species:canine species:\"k 9\"
species:\"cani lupu familiari\" species:dog) |
(keywords_bm25_no_norms:\"cani familiari\" keywords_bm25_no_norms:canine
keywords_bm25_no_norms:\"k 9\" keywords_bm25_no_norms:\"cani lupu
familiari\" keywords_bm25_no_norms:dog)^50.0 | (description:\"cani
familiari\" description:canine description:\"k 9\" description:\"cani lupu
familiari\" description:dog) | (kw1ranked:\"cani familiari\"
kw1ranked:canine kw1ranked:\"k 9\" kw1ranked:\"cani lupu familiari\"
kw1ranked:dog)^100.0 | (text:\"cani familiari\" text:canine text:\"k 9\"
text:\"cani lupu familiari\" text:dog) | (title:\"cani familiari\"
title:canine title:\"k 9\" title:\"cani lupu familiari\" title:dog)^100.0 |
(keywordsranked_bm25_no_norms:\"cani familiari\"
keywordsranked_bm25_no_norms:canine keywordsranked_bm25_no_norms:\"k 9\"
keywordsranked_bm25_no_norms:\"cani lupu familiari\"
keywordsranked_bm25_no_norms:dog)^50.0 | (authors:\"cani familiari\"
authors:canine authors:\"k 9\" authors:\"cani lupu familiari\"
authors:dog))~0.4)~3)

/// PF ///
() () () ()

/// PF2 ////
(authors:\"(hypersensitive allergy hypersensitive allergic) reaction\"~11 |
species:\"(hypersensitive allergy hypersensitive allergic)
reaction\"~11)~0.4

/// PF3 ///
() (),
product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",

With index-timy synonyms:
/// Q ///
+((((kw1:allergic)^100.0 | species:allergic |
(keywords_bm25_no_norms:allergic)^50.0 | description:allergic |
(kw1ranked:allergic)^100.0 | text:allergic | (title:allergic)^100.0 |
(keywordsranked_bm25_no_norms:allergic)^50.0 | authors:allergic)~0.4
((kw1:reaction)^100.0 | species:reaction |
(keywords_bm25_no_norms:reaction)^50.0 | description:reaction |
(kw1ranked:reaction)^100.0 | text:reaction | (title:reaction)^100.0 |
(keywordsranked_bm25_no_norms:reaction)^50.0 | authors:reaction)~0.4
((kw1:dog)^100.0 | species:dog | (keywords_bm25_no_norms:dog)^50.0 |
description:dog | (kw1ranked:dog)^100.0 | text:dog | (title:dog)^100.0 |
(keywordsranked_bm25_no_norms:dog)^50.0 | authors:dog)~0.4)~3)

/// PF ///
((title:\"allergic reaction dog\"~20)^5000.0 |
(keywordsranked_bm25_no_norms:\"allergic reaction dog\"~20)^5000.0 |
(keywords_bm25_no_norms:\"allergic reaction dog\"~20)^1500.0 |
(text:\"allergic reaction dog\"~20)^1000.0)~0.4 ((kw1ranked:\"allergic
reaction dog\"~10)^5000.0 | (kw1:\"allergic reaction dog\"~10)^500.0)~0.4
((authors:\"allergic reaction dog\")^250.0 | description:\"allergic
reaction dog\")~0.4 ((text:\"allergic reaction dog\"~100)^500.0)~0.4

/// PF2 ///
((authors:\"allergic reaction\"~11 | species:\"allergic reaction\"~11)~0.4

/// PF3 ///
(authors:\"reaction dog\"~11 | species:\"reaction dog\"~11)~0.4)
((title:\"allergic reaction dog\"~22)^1000.0 |
(keywordsranked_bm25_no_norms:\"allergic reaction dog\"~22)^1000.0 |
(text:\"allergic reaction dog\"~22)^100.0)~0.4
((keywords_bm25_no_norms:\"allergic reaction dog\"~12)^500.0 |
(kw1ranked:\"allergic reaction dog\"~12)^100.0 | (kw1:\"allergic reaction
dog\"~12)^100.0)~0.4,product(max(10.0/(3.16E-11*float(ms(const(1555545600000),date(dateint)))+6.0),int(documentdatefix)),scale(map(int(rank),-1.0,-1.0,const(0.5),null),0.5,2.0)))",


Working on getting this rigged up in the debugger, but would appreciate any
feedback.

Thank you,
Elizabeth

Reply via email to