Hi, I am trying to configure Solr for Chinese search and I've been having
trouble getting the dismax query parser to behave correctly.

In schema.xml, I'm using SmartChineseAnalyzer on my fulltext field with
autoGeneratePhraseQueries="false".  I've verified that it is correctly
tokenizing Chinese words, and the query parser is in fact not generating
phrase queries.  But I can't figure out why dismax is only producing a
single DisjunctionMaxQuery object for multiple Chinese terms, thereby
producing an OR effect, which is not what I want.

Here's an example of the parsed query debug output that I get for a
multiple term English query:

<str name="rawquerystring">my friend</str>
<str name="querystring">my friend</str>
<str name="parsedquery">
+((DisjunctionMaxQuery((t_field_keywords:unified_fulltext:my)~0.01)
DisjunctionMaxQuery((t_field_keywords:unified_fulltext:friend)~0.01))~2)
</str>
<str name="parsedquery_toString">
+(((t_field_keywords:unified_fulltext:my)~0.01
(t_field_keywords:unified_fulltext:friend)~0.01)~2)
</str>

This is exactly what I want to happen for Chinese queries.  But for a
Chinese query, you can see that I only get a single DisjunctionMaxQuery
object:

<str name="rawquerystring">我的朋友</str>
<str name="querystring">我的朋友</str>
<str name="parsedquery">
+DisjunctionMaxQuery(((t_field_keywords:unified_fulltext:我
t_field_keywords:unified_fulltext:的
t_field_keywords:unified_fulltext:朋友))~0.01)
</str>
<str name="parsedquery_toString">
+((t_field_keywords:unified_fulltext:我 t_field_keywords:unified_fulltext:的
t_field_keywords:unified_fulltext:朋友))~0.01
</str>

The result of this is that an increase in the number of terms increases the
number of results, instead of narrowing them as it should.

I feel like this is so close to working... does anybody know what I need to
do to get the query parser to behave correctly?  Any help would be much
appreciated!

Joel

Reply via email to