Hey Solr Users,
I've noticed an odd behavior between word graph delimiter and the sow
parameter. When the word graph delimiter gets invoked and sow=true, there
is the possibility to miss results which include alpha num splitting but
aren't exact matches. So if I have a document with "ABC123 DEF456_GHI", the
combination of sow=true and WordDelimeterGraph seem to break queries for
"def456". See full repro below.
I believe this is a bug. Could someone please take a look at my repro and
confirm my repro, or let me know if something is misconfigured here?
*Repro*
- solr 9 with this field type definition for field "test_en"
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class=
"solr.WhitespaceTokenizerFactory"/> <filter class=
"solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateAll="1" preserveOriginal="1"
splitOnCaseChange="1"/> <filter class="solr.FlattenGraphFilterFactory"/> <
filter class="solr.LowerCaseFilterFactory"/> <filter class=
"solr.SnowballPorterFilterFactory"/> </analyzer> <analyzer type="query"> <
tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class=
"solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateAll="1" preserveOriginal="1"
splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <
filter class="solr.SnowballPorterFilterFactory"/> </analyzer> </fieldType>
- Create document {"id": 1, "test_en": ["ABC123 DEF456_GHI"]}
- Query the following; all should hit, but one combination misses
- sow=true, q=def456
- misses
- sow=true, q=abc123
- hits
- sow=false, q=def456
- hits
- sow=false, q=abc123
- hits