edismax using bigrams instead of phrases?

Bill Dueber Fri, 04 Dec 2009 08:27:20 -0800

I've started trying edismax, and have noticed that my relevancy ranking is
messed up with edismax because, according to the debug output, it's using
bigrams instead of phrases and inexplicably ignoring a couple of the pf
fields. While the hit count isn't changing,  this kills my ability to boost
exact title matches (or, I would guess, exact-anything-else matches, too).


debugQuery output can be seen at:

http://paste.lisp.org/display/91582

That's the exact same query except for the defType.

Note that instead of looking in the 'pf' fields for the search string "gone
with the wind", it's looking individually for "gone with", "with the", and
"the wind".

edismax is also completely ignoring the title_a and title_ab fields, which
are defined as "exactmatcher" as follows.

<!-- Full string, stripped of \W and lowercased, for exact and left-anchored
matching -->
     <fieldType name="exactmatcher" class="solr.TextField" omitNorms="true">
       <analyzer>
         <tokenizer class="solr.KeywordTokenizerFactory"/>
         <filter class="schema.UnicodeNormalizationFilterFactory"
version="icu4j" composed="false" remove_diacritics="true"
remove_modifiers="true" fold="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.TrimFilterFactory"/>
         <filter class="solr.PatternReplaceFilterFactory"
              pattern="[^\p{L}\p{N}]" replacement=""  replace="all"
         />
       </analyzer>
     </fieldType>


Any help on this would be much appreciated.


-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library

edismax using bigrams instead of phrases?

Reply via email to