edge_ngram and short words containing digits

Kevin Murphy Wed, 28 May 2014 15:20:07 -0700

Hi,

i’m using Django Haystack 2.1.0 with Solr 4.8.1 in an auto-complete 
application.  I’ve noticed that words containing digits are not being matched.  
Examples are ‘B2B’, ‘PSG4’, and ‘5S_rRNA’.  The words match up to the 
occurrence of the digit and fail starting with the digit.


Below is what I believe to be the relevant chunk from the Haystack-generated 
Solr schema.xml.  If I need to include more, let me know.

COPB2

    <fieldType name="edge_ngram" class="solr.TextField" 
positionIncrementGap="1">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="1"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" 
maxGramSize="15" side="front" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="1"/>
      </analyzer>
    </fieldType>

Can I get this to work by tweaking the WordDelimiterFilterFactory attributes 
somehow, or do I need to do something else?

Thanks,
Kevin

edge_ngram and short words containing digits

Reply via email to