LetterTokenizer + EdgeNGram + apostrophe in query = invalid result

Matt Weber Fri, 25 Feb 2011 01:04:26 -0800

I have the following field defined in my schema:

  <fieldType name="ngram" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.LetterTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="25" />
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.LetterTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>


  <field name="person" type="ngram" indexed="true" stored="true" />

I have the default field set to "person" and have indexed the
following document:

<add>
       <doc>
               <field name="id"><![CDATA[1001116609]]></field>
               <field name="person"><![CDATA[Vincent M D'Onofrio]]></field>
       </doc>
</add>


The following queries return the result as expected using the standard
request handler:

vincent m d onofrio
d'o
onofrio
d onofrio

The following query fails:

d'onofrio

This is weird because "d'o" returns a result.  As soon as I type the
"n" I start to get no results.  I ran this though the field analysis
page and  it shows that this query is being tokenized correctly and
should yield a result.

I am using a build of trunk Solr (r1073990) and the example
solrconfig.xml.  I am also using the example schema with the addition
of my ngram field.

Any ideas?  I have tried this with other word's containing an
apostrophe and they all stop returning results after 4 characters.


Thanks,
Matt Weber

LetterTokenizer + EdgeNGram + apostrophe in query = invalid result

Reply via email to