Re: EdgeNGram relevancy

Ahmet Arslan Thu, 11 Nov 2010 08:58:09 -0800

You can add an additional field, with using KeywordTokenizerFactory instead of 
WhitespaceTokenizerFactory. And query both these fields with an OR operator.


edgytext:(Bill Cl) OR edgytext2:"Bill Cl"

You can even apply boost so that begins with matches comes first.

--- On Thu, 11/11/10, Robert Gründler <rob...@dubture.com> wrote:

> From: Robert Gründler <rob...@dubture.com>
> Subject: EdgeNGram relevancy
> To: solr-user@lucene.apache.org
> Date: Thursday, November 11, 2010, 5:51 PM
> Hi,
> 
> consider the following fieldtype (used for
> autocompletion):
> 
>   <fieldType name="edgytext" class="solr.TextField"
> positionIncrementGap="100">
>    <analyzer type="index">
>      <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>      <filter
> class="solr.LowerCaseFilterFactory"/>
>      <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"
> />     
>          <filter
> class="solr.PatternReplaceFilterFactory" pattern="([^a-z])"
> replacement="" replace="all" />
>      <filter
> class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="25" />
>    </analyzer>
>    <analyzer type="query">
>      <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>      <filter
> class="solr.LowerCaseFilterFactory"/>
>      <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>          <filter
> class="solr.PatternReplaceFilterFactory" pattern="([^a-z])"
> replacement="" replace="all" />
>    </analyzer>
>   </fieldType>
> 
> 
> This works fine as long as the query string is a single
> word. For multiple words, the ranking is weird though.
> 
> Example:
> 
> Query String: "Bill Cl"
> 
> Result (in that order):
> 
> - Clyde Phillips
> - Clay Rogers
> - Roger Cloud
> - Bill Clinton
> 
> "Bill Clinton" should have the highest rank in that
> case.  
> 
> Has anyone an idea how to to configure this fieldtype to
> make matches in both tokens rank higher than those who match
> in either token?
> 
> 
> thanks!
> 
> 
> -robert
> 
> 
> 
>

Re: EdgeNGram relevancy

Reply via email to