according to the fieldtype i posted previously, i think it's because of: 1. WhiteSpaceTokenizer splits the String "Clyde Phillips" into 2 tokens: "Clyde" and "Phillips" 2. EdgeNGramFilter gets the 2 tokens, and creates an EdgeNGram for each token: "C" "Cl" "Cly" ... AND "P" "Ph" "Phi" ...
The Query String "Bill Cl" gets split up in 2 Tokens "Bill" and "Cl" by the WhitespaceTokenizer. This creates a match for the 2nd token "Ci" of the query, and one of the "sub"tokens the EdgeNGramFilter created: "Cl". -robert On Nov 11, 2010, at 21:34 , Andy wrote: > Could anyone help me understand what does "Clyde Phillips" appear in the > results for "Bill Cl"?? > > "Clyde Phillips" doesn't produce any EdgeNGram that would match "Bill Cl", so > why is it even in the results? > > Thanks. > > --- On Thu, 11/11/10, Ahmet Arslan <iori...@yahoo.com> wrote: > >> You can add an additional field, with >> using KeywordTokenizerFactory instead of >> WhitespaceTokenizerFactory. And query both these fields with >> an OR operator. >> >> edgytext:(Bill Cl) OR edgytext2:"Bill Cl" >> >> You can even apply boost so that begins with matches comes >> first. >> >> --- On Thu, 11/11/10, Robert Gründler <rob...@dubture.com> >> wrote: >> >>> From: Robert Gründler <rob...@dubture.com> >>> Subject: EdgeNGram relevancy >>> To: solr-user@lucene.apache.org >>> Date: Thursday, November 11, 2010, 5:51 PM >>> Hi, >>> >>> consider the following fieldtype (used for >>> autocompletion): >>> >>> <fieldType name="edgytext" >> class="solr.TextField" >>> positionIncrementGap="100"> >>> <analyzer type="index"> >>> <tokenizer >>> class="solr.WhitespaceTokenizerFactory"/> >>> <filter >>> class="solr.LowerCaseFilterFactory"/> >>> <filter >>> class="solr.StopFilterFactory" ignoreCase="true" >>> words="stopwords.txt" enablePositionIncrements="true" >>> /> >>> <filter >>> class="solr.PatternReplaceFilterFactory" >> pattern="([^a-z])" >>> replacement="" replace="all" /> >>> <filter >>> class="solr.EdgeNGramFilterFactory" minGramSize="1" >>> maxGramSize="25" /> >>> </analyzer> >>> <analyzer type="query"> >>> <tokenizer >>> class="solr.WhitespaceTokenizerFactory"/> >>> <filter >>> class="solr.LowerCaseFilterFactory"/> >>> <filter >>> class="solr.StopFilterFactory" ignoreCase="true" >>> words="stopwords.txt" enablePositionIncrements="true" >> /> >>> <filter >>> class="solr.PatternReplaceFilterFactory" >> pattern="([^a-z])" >>> replacement="" replace="all" /> >>> </analyzer> >>> </fieldType> >>> >>> >>> This works fine as long as the query string is a >> single >>> word. For multiple words, the ranking is weird >> though. >>> >>> Example: >>> >>> Query String: "Bill Cl" >>> >>> Result (in that order): >>> >>> - Clyde Phillips >>> - Clay Rogers >>> - Roger Cloud >>> - Bill Clinton >>> >>> "Bill Clinton" should have the highest rank in that >>> case. >>> >>> Has anyone an idea how to to configure this fieldtype >> to >>> make matches in both tokens rank higher than those who >> match >>> in either token? >>> >>> >>> thanks! >>> >>> >>> -robert >>> >>> >>> >>> >> >> >> >> > > >