RE: Ngram Repeats

Feak, Todd Wed, 24 Dec 2008 07:51:39 -0800

It sounds like you want to get a list of "brands" that start with a particular 
string, out of your index. But your index is based on products, not brands. Is 
that correct?


If so, that has nothing to do with NGrams (or even tokenizing for that matter) 
I think you should be doing a Facet query instead of a standard query. Take a 
look at Facets on the Solr Wiki.

http://wiki.apache.org/solr/SolrFacetingOverview

-Todd Feak
-----Original Message-----
From: Jeff Newburn [mailto:jnewb...@zappos.com] 
Sent: Wednesday, December 24, 2008 7:39 AM
To: solr-user@lucene.apache.org
Subject: Ngram Repeats

I have set up an ngram filter and have run into a problem.  Our index is
basically composed of products as the unique id.  Each product also has a
brand name assigned to it.  There are much fewer unique brand names than
products in the index.  I tried to set up an ngram based on the brand name
but it is returning the same brand name over and over for each product.
Essentially if you try for the brand name starting with ³as² you will get
the brand ³asus² 15 times.  Is there a way to make the ngram only return
unique brand name?  I have attached the configuration below.

        <fieldType name="prefix_token" class="solr.TextField"
positionIncrementGap="1">
                <analyzer type="index">
                        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                        <filter class="solr.LowerCaseFilterFactory" />
                        <filter class="solr.EdgeNGramFilterFactory"
minGramSize="1" maxGramSize="20"/>
                </analyzer>
                <analyzer type="query">
                        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                        <filter class="solr.LowerCaseFilterFactory" />
                </analyzer>
        </fieldType>
-Jeff

RE: Ngram Repeats

Reply via email to