Well, I fixed my own problem in the end. For the record, this is the
schema I ended up going with:

<fieldType name="text_bigram" class="solr.TextField" omitNorms="true">
    <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.NGramFilterFactory" maxGramSize="2"
minGramSize="2" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.NGramFilterFactory" maxGramSize="2"
minGramSize="2"/>
    </analyzer>
</fieldType>

I could have left it a trigram but went with a bigram because with this
setup, I can get queries to properly hit as long as the min/max gram
size is met. In other words, for any queries two or more characters
long, this works for me. Less than two characters and it fails. 

I don't know exactly why that is, but I'll take it anyway!

- Charlie


-----Original Message-----
From: Charlie Jackson [mailto:charlie.jack...@cision.com] 
Sent: Friday, October 23, 2009 10:00 AM
To: solr-user@lucene.apache.org
Subject: NGram query failing

I have a requirement to be able to find hits within words in a free-form
id field. The field can have any type of alphanumeric data - it's as
likely it will be something like "123456" as it is to be "SUN-123-ABC".
I thought of using NGrams to accomplish the task, but I'm having a
problem. I set up a field like this

 

    <fieldType name="text_trigram" class="solr.TextField"
positionIncrementGap="100">

        <analyzer>

                <tokenizer class="solr.NGramTokenizerFactory"
minGramSize="1" maxGramSize="3"/>

                <filter class="solr.LowerCaseFilterFactory"/>

      </analyzer>

    </fieldType>

 

After indexing a field like this, the analysis page indicates my queries
should work. If I give it a sample field value of "ABC-123456-SUN" and a
query value of "45" it shows hits in several places, which is what I
expected.

 

However, when I actually query the field with something like "45" I get
no hits back. Looking at the debugQuery output, it looks like it's
taking my analyzed query text and putting it into a phrase query. So,
for a query of "45" it turns into a phrase query of <field>:"4 5 45"
which then doesn't hit on anything in my index.

 

What am I missing to make this work?

 

- Charlie

Reply via email to