camel-casing and dismax troubles

Geoffrey Young Tue, 12 May 2009 16:19:34 -0700

hi all :)

I'm having trouble with camel-cased query strings and the dismax handler.


a user query

 LeAnn Rimes

isn't matching the indexed term

 Leann Rimes

even though both are lower-cased in the end.  furthermore, the
analysis tool shows a match.

the debug query looks like

 "parsedquery":"+((DisjunctionMaxQuery((search-en:\"(leann le)
ann\")) DisjunctionMaxQuery((search-en:rimes)))~2) ()",

I have a feeling it's due to how the broken up tokens are added back
into the token stream with PreserveOriginal, and some strange
interaction between that order and dismax, but I'm not entirely sure.

configs follow.  thoughts appreciated.

--Geoff

  <fieldType name="search-en" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.ISOLatin1AccentFilterFactory" />
      <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1"
                                                      generateWordParts="1"
                                                      generateNumberParts="1"
                                                      catenateWords="1"
                                                      catenateNumbers="1"
                                                      catenateAll="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
      <filter class="solr.StopFilterFactory" ignoreCase="false"
words="stopwords-en.txt"/>
    </analyzer>

    <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.ISOLatin1AccentFilterFactory" />
      <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1"
                                                      generateWordParts="1"
                                                      generateNumberParts="1"
                                                      catenateWords="0"
                                                      catenateNumbers="0"
                                                      catenateAll="0"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="false"
words="stopwords-en.txt"/>
    </analyzer>
  </fieldType>

camel-casing and dismax troubles

Reply via email to