Be aware that if you search a field with stemming, then the index will only contain the stems, i.e. cars, caring may both be indexed as «car», and when you do a wildcard search, all analysis is skipped, so you are only targeting the exact tokens that happen to be in that field. Thus a search for «ca*s» or «c*ing» or «cars*» will not match, but «car*» and even «c*r» will match both these words, which would be surprising right? So if wildcard search is a key feature you better provide a copyField with a fieldType in your schema that do not do stemming. Probably only StandardTokenizer and LowercaseFilter. Then use that field for your wildcard queries instead of the generic stemmed field.
Jan > 13. feb. 2020 kl. 13:52 skrev Fischer, Stephen > <sfisc...@pennmedicine.upenn.edu>: > > Folks, > > I am seeing very strange (bad) wildcard behavior (solr 8). > > "kinase" finds hits as expected. > > "kin*ase" and "kin*se" find 0 results. "kinase*" matches only values like > "kinase," and "kinase-" but not "kinase" > > I have done the analysis as Erick suggested (thanks!) but it is not helping > me understand why we'd have this problem. > > I have put together 12 screenshots from the Solr web UI that show in detail: > - the queries I ran to get the results above > - various analyses trying to understand why > - the schema for the fieldType in question > > https://docs.google.com/presentation/d/10fIAesqkTnvmJBFaerEhnqWhSiaEvVW7u9jE1nX564Q/edit?usp=sharing > > thanks, > steve > > -----Original Message----- > From: Sotiris Fragkiskos <sfra...@gmail.com> > Sent: Thursday, February 13, 2020 4:03 AM > To: solr-user@lucene.apache.org > Subject: [External] Re: wildcards match end-of-word? > > Hi Erick, > thanks very much for this information, it was immensely useful, I always had > the same question! > I'm now seeing the Analysis page and finally I don't have to rely on an > external online stemmer to see what solr *probably* stemmed the term to!! > But I still can't make the asterisk and question mark work inside the term, > even in the earlier parts of it. > e.g. tr?ining > I would expect it to match train. But it doesn't. > PSF at the end just shows t | ain > every line before that actually shows t | aining (ST,SF,SF,LCF,EPF,SKMF) Am I > doing something very wrong?? > > thanks again! > Sotiri > > On Wed, Feb 12, 2020 at 1:44 PM Erick Erickson <erickerick...@gmail.com> > wrote: > >> Steve: >> >> You _really_ want to get acquainted with the admin UI/Analysis page ;). >> Choose a core/collection and you should see the choice. It shows you >> exactly what transformations your data goes through. If you hover over >> the light gray pairs of letters, you’ll get a tooltip showing you what >> part of your analysis chain is responsible for a particular change. I >> un-check the “verbose” box 95% of the time BTW. >> >> The critical bit is that what comes out of the end of the analysis >> pipe are the tokens that are actually _in_ the index. From there, >> problems like this make more sense. >> >> My bet is that, as Walter says, you have a stemmer in the analysis >> chain and the actual token in the index is “kinas” so of course >> “kinase*” won’t be found. By adding OR kinase to the query, that token >> is stemmed to “kinas” and matches. >> >> Also, adding &debug=query to your URL will show you what the query >> looks like after parsing and analysis, also a major tool for figuring >> out what’s really happening. >> >> Wildcards are not stemmed, which can lead to surprising results. >> There’s no perfect answer here. Let’s claim wildcards _were_ stemmed. >> Then you’d have to try to explain why “running*” returned a doc with >> only “run” or “runner” or “runs” or... in it, but searching for >> “runnin*” did not due the stemmer not recognizing it as a stemmable word. >> >> Finally, one of my personal hot buttons is wildcards in general. >> They’re very often over-used because people are used to simple search >> capabilities. >> Something about “if your only tool is a hammer, every problem looks >> like a nail”. That gets into training users too though... >> >> Best, >> Erick >> >>> On Feb 11, 2020, at 9:24 PM, Fischer, Stephen < >> sfisc...@pennmedicine.upenn.edu> wrote: >>> >>> Hi, >>> >>> I am a solr newbie. I was surprised to discover that a search for >> kinase* returned fewer results than kinase. >>> >>> Then I read the wildcard documentation< >> https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.htm >> l#TheStandardQueryParser-WildcardSearches>, >> and saw why. kinase* will not match the word "kinase". >>> >>> Our end-users won't expect this behavior. Presumably the solution >>> would >> be for them (actually us, on their behalf), to use kinase* OR kinase. >>> >>> But that is kind of a hack. >>> >>> Is there a way we can configure solr to have wildcards match on >> end-of-word? >>> >>> Thanks, >>> Steve >> >>