I'm aware that using a field tokenized with KeywordTokenizerFactory is in a dismax 'qf' is often going to result in 0 hits on that field -- (when a whitespace-containing query is entered). But I do it anyway, for cases where a non-whitespace-containing query is entered, then it hits. And in those cases where it doesn't hit, I figure okay, well, the other fields in qf will hit or not, that's good enough.

And usually that works. But it works _differently_ when my query contains an ampersand (or any other punctuation), result in 0 hits when it shoudln't, and I can't figure out why.

basically,

&defType=dismax&mm=100%&q=one : two&qf=text_field

gets hits. The ":" is thrown out the text_field, but the mm still passes somehow, right?

But, in the same index:

&defType=dismax&mm=100%&q=one : two&qf=text_field keyword_tokenized_text_field

gets 0 hits. Somehow maybe the inclusion of the keyword_tokenized_text_field in the qf causes dismax to calculate the mm differently, decide there are three tokens in there and they all must match, and the token ":" can never match because it's not in my index it's stripped out... but somehow this isn't a problem unless I include a keyword-tokenized field in the qf?

This is really confusing, if anyone has any idea what I'm talking about it and can shed any light on it, much appreciated.

The conclusion I am reaching is just NEVER include anything but a more or less ordinarily tokenized field in a dismax qf. Sadly, it was useful for certain use cases for me.

Oh, hey, the debugging trace woudl probably be useful:


<lstname="debug">
<strname="rawquerystring">
churchill : roosevelt
</str>
<strname="querystring">
churchill : roosevelt
</str>
<strname="parsedquery">
+((DisjunctionMaxQuery((isbn_t:churchill | title1_t:churchil)~0.01) DisjunctionMaxQuery((isbn_t::)~0.01) DisjunctionMaxQuery((isbn_t:roosevelt | title1_t:roosevelt)~0.01))~3) DisjunctionMaxQuery((title2_unstem:"churchill roosevelt"~3^240.0 | text:"churchil roosevelt"~3^10.0 | title2_t:"churchil roosevelt"~3^50.0 | author_unstem:"churchill roosevelt"~3^400.0 | title_exactmatch:churchill roosevelt^500.0 | title1_t:"churchil roosevelt"~3^60.0 | title1_unstem:"churchill roosevelt"~3^320.0 | author2_unstem:"churchill roosevelt"~3^240.0 | title3_unstem:"churchill roosevelt"~3^80.0 | subject_t:"churchil roosevelt"~3^10.0 | other_number_unstem:"churchill roosevelt"~3^40.0 | subject_unstem:"churchill roosevelt"~3^80.0 | title_series_t:"churchil roosevelt"~3^40.0 | title_series_unstem:"churchill roosevelt"~3^60.0 | text_unstem:"churchill roosevelt"~3^80.0)~0.01)
</str>
<strname="parsedquery_toString">
+(((isbn_t:churchill | title1_t:churchil)~0.01 (isbn_t::)~0.01 (isbn_t:roosevelt | title1_t:roosevelt)~0.01)~3) (title2_unstem:"churchill roosevelt"~3^240.0 | text:"churchil roosevelt"~3^10.0 | title2_t:"churchil roosevelt"~3^50.0 | author_unstem:"churchill roosevelt"~3^400.0 | title_exactmatch:churchill roosevelt^500.0 | title1_t:"churchil roosevelt"~3^60.0 | title1_unstem:"churchill roosevelt"~3^320.0 | author2_unstem:"churchill roosevelt"~3^240.0 | title3_unstem:"churchill roosevelt"~3^80.0 | subject_t:"churchil roosevelt"~3^10.0 | other_number_unstem:"churchill roosevelt"~3^40.0 | subject_unstem:"churchill roosevelt"~3^80.0 | title_series_t:"churchil roosevelt"~3^40.0 | title_series_unstem:"churchill roosevelt"~3^60.0 | text_unstem:"churchill roosevelt"~3^80.0)~0.01
</str>
<lstname="explain"/>
<strname="QParser">
DisMaxQParser
</str>
<nullname="altquerystring"/>
<nullname="boostfuncs"/>
<lstname="timing">
<doublename="time">
6.0
</double>
<lstname="prepare">
<doublename="time">
3.0
</double>
<lstname="org.apache.solr.handler.component.QueryComponent">
<doublename="time">
2.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.FacetComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.MoreLikeThisComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.HighlightComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.StatsComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.SpellCheckComponent">
<doublename="time">
0.0
</double>
</lst>
<lstname="org.apache.solr.handler.component.DebugComponent">
<doublename="time">
0.0
</double>
</lst>
</lst>



Reply via email to