FYI -------------------------- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache Solr PMC Member*
e-mail: [email protected] *Sease* - Information Retrieval Applied Consulting | Training | Open Source Website: Sease.io <http://sease.io/> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter <https://twitter.com/seaseltd> | Youtube <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github <https://github.com/seaseltd> On Fri, 9 Sept 2022 at 15:49, Alessandro Benedetti <[email protected]> wrote: > Not related to the word-delimiter token filter but I did a study a while > ago on the sow parameter, identified a couple of bugs and fixed one (the > other was discussed and in the end not accepted as an improvement as it was > controversial). > > > https://sease.io/2021/05/apache-solr-sow-parameter-split-on-whitespace-and-multi-field-full-text-search.html > > Cheers > -------------------------- > *Alessandro Benedetti* > Director @ Sease Ltd. > *Apache Lucene/Solr Committer* > *Apache Solr PMC Member* > > e-mail: [email protected] > > > *Sease* - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > > > On Wed, 7 Sept 2022 at 14:19, Markus Jelsma <[email protected]> > wrote: > >> Hello Stephen, >> >> Using Solr 8.8.1 i tried to reproduce your strange problem, copied your >> schema and indexed a single document. As expected, i got exactly one >> result >> for all four combinations, also using both the default Lucene QParser and >> the Edismax QParser. >> >> So it appears to work just fine here on 8.8.1. The WordDelimeterGraph is >> relatively new and had only few issues. Maybe you can try to see if it >> works without the Graph-type token filters, using the old WordDelimeter >> That one is tried and tested. >> >> Regards, >> Markus >> >> Op vr 2 sep. 2022 om 21:57 schreef Stephen Lewis Bianamara < >> [email protected]>: >> >> > Hey Solr Users, >> > >> > I've noticed an odd behavior between word graph delimiter and the sow >> > parameter. When the word graph delimiter gets invoked and sow=true, >> there >> > is the possibility to miss results which include alpha num splitting but >> > aren't exact matches. So if I have a document with "ABC123 DEF456_GHI", >> the >> > combination of sow=true and WordDelimeterGraph seem to break queries for >> > "def456". See full repro below. >> > >> > I believe this is a bug. Could someone please take a look at my repro >> and >> > confirm my repro, or let me know if something is misconfigured here? >> > >> > *Repro* >> > >> > - solr 9 with this field type definition for field "test_en" >> > >> > <fieldType name="text_en" class="solr.TextField" >> positionIncrementGap="100" >> > autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer >> class= >> > "solr.WhitespaceTokenizerFactory"/> <filter class= >> > "solr.WordDelimiterGraphFilterFactory" generateWordParts="1" >> > generateNumberParts="1" catenateAll="1" preserveOriginal="1" >> > splitOnCaseChange="1"/> <filter >> class="solr.FlattenGraphFilterFactory"/> < >> > filter class="solr.LowerCaseFilterFactory"/> <filter class= >> > "solr.SnowballPorterFilterFactory"/> </analyzer> <analyzer >> type="query"> < >> > tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class= >> > "solr.WordDelimiterGraphFilterFactory" generateWordParts="1" >> > generateNumberParts="1" catenateAll="1" preserveOriginal="1" >> > splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> < >> > filter class="solr.SnowballPorterFilterFactory"/> </analyzer> >> </fieldType> >> > >> > - Create document {"id": 1, "test_en": ["ABC123 DEF456_GHI"]} >> > - Query the following; all should hit, but one combination misses >> > - sow=true, q=def456 >> > - misses >> > - sow=true, q=abc123 >> > - hits >> > - sow=false, q=def456 >> > - hits >> > - sow=false, q=abc123 >> > - hits >> > >> >
