Hey Solr Users,

I've noticed an odd behavior between word graph delimiter and the sow
parameter. When the word graph delimiter gets invoked and sow=true, there
is the possibility to miss results which include alpha num splitting but
aren't exact matches. So if I have a document with "ABC123 DEF456_GHI", the
combination of sow=true and WordDelimeterGraph seem to break queries for
"def456". See full repro below.

I believe this is a bug. Could someone please take a look at my repro and
confirm my repro, or let me know if something is misconfigured here?

*Repro*

   - solr 9 with this field type definition for field "test_en"

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class=
"solr.WhitespaceTokenizerFactory"/> <filter class=
"solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateAll="1" preserveOriginal="1"
splitOnCaseChange="1"/> <filter class="solr.FlattenGraphFilterFactory"/> <
filter class="solr.LowerCaseFilterFactory"/> <filter class=
"solr.SnowballPorterFilterFactory"/> </analyzer> <analyzer type="query"> <
tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class=
"solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateAll="1" preserveOriginal="1"
splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <
filter class="solr.SnowballPorterFilterFactory"/> </analyzer> </fieldType>

   - Create document {"id": 1, "test_en": ["ABC123 DEF456_GHI"]}
   - Query the following; all should hit, but one combination misses
      - sow=true, q=def456
         - misses
      - sow=true, q=abc123
         - hits
      - sow=false, q=def456
         - hits
      - sow=false, q=abc123
         - hits

Reply via email to