When I have a field using CJKBigramFilter, parsed CJK chars have a different parsedQuery than non-CJK queries.
(旧小说 is 3 chars, so 2 bigrams) args sent in: q={!qf=bi_fld}旧小说&pf=&pf2=&pf3= debugQuery <str name="rawquerystring">{!qf=bi_fld}旧小说</str> <str name="querystring">{!qf=bi_fld}旧小说</str> <str name="parsedquery">(+DisjunctionMaxQuery((((bi_fld:旧小 bi_fld:小说)~2))~0.01) ())/no_coord</str> <str name="parsedquery_toString">+(((bi_fld:旧小 bi_fld:小说)~2))~0.01 ()</str> If i use a non-CJK query string, with the same field: args sent in: q={!qf=bi_fld}foo bar&pf=&pf2=&pf3= debugQuery: <str name="rawquerystring">{!qf=bi_fld}foo bar</str> <str name="querystring">{!qf=bi_fld}foo bar</str> <str name="parsedquery">(+((DisjunctionMaxQuery((bi_fld:foo)~0.01) DisjunctionMaxQuery((bi_fld:bar)~0.01))~2))/no_coord</str> <str name="parsedquery_toString">+(((bi_fld:foo)~0.01 (bi_fld:bar)~0.01)~2)</str> Why are the parsedquery_toString formula different? And is there any difference in the actual relevancy formula? How can you tell the difference between the MinNrShouldMatch and a qs or ps or tie value, if they are all represented as ~n in the parsedQuery string? To try to get a handle on qs, ps, tie and mm: args: q={!qf=bi_fld pf=bi_fld}"a b" c d&qs=5&ps=4 debugQuery: <str name="rawquerystring">{!qf=bi_fld pf=bi_fld}"a b" c d</str> <str name="querystring">{!qf=bi_fld pf=bi_fld}"a b" c d</str> <str name="parsedquery">(+((DisjunctionMaxQuery((bi_fld:"a b"~5)~0.01) DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) DisjunctionMaxQuery((bi_fld:"c d"~4)~0.01))/no_coord</str> <str name="parsedquery_toString">+(((bi_fld:"a b"~5)~0.01 (bi_fld:c)~0.01 (bi_fld:d)~0.01)~3) (bi_fld:"c d"~4)~0.01</str> I get that qs, the query slop, is for explicit phrases in the query, so "a b"~5 makes sense. I also get that ps is for boosting of phrases, so I get (bi_fld:"c d"~4) … but where is (cjk_uni_pub_search:"a b c d"~4) ? Using dismax (instead of edismax): args: q={!dismax qf=bi_fld pf=bi_fld}"a b" c d&qs=5&ps=4 debugQuery: <str name="rawquerystring">{!dismax qf=bi_fld pf=bi_fld}"a b" c d</str> <str name="querystring">{!dismax qf=bi_fld pf=bi_fld}"a b" c d</str> <str name="parsedquery">(+((DisjunctionMaxQuery((bi_fld:"a b"~5)~0.01) DisjunctionMaxQuery((bi_fld:c)~0.01) DisjunctionMaxQuery((bi_fld:d)~0.01))~3) DisjunctionMaxQuery((bi_fld:"a b c d"~4)~0.01))/no_coord</str> <str name="parsedquery_toString">+(((bi_fld:"a b"~5)~0.01 (bi_fld:c)~0.01 (bi_fld:d)~0.01)~3) (bi_fld:"a b c d"~4)~0.01</str> So is this an edismax bug? FYI, I am running Solr 4.4. I have fields defined like so: <fieldtype name="text_cjk_bi" class="solr.TextField" positionIncrementGap="10000" autoGeneratePhraseQueries="false"> <analyzer> <tokenizer class="solr.ICUTokenizerFactory" /> <filter class="solr.CJKWidthFilterFactory"/> <filter class="solr.ICUTransformFilterFactory" id="Traditional-Simplified"/> <filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/> <filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true" katakana="true" hangul="true" outputUnigrams="false" /> </analyzer> </fieldtype> The request handler uses edismax: <requestHandler name="search" class="solr.SearchHandler" default="true"> <lst name="defaults"> <str name="defType">edismax</str> <str name="q.alt">:</str> <str name="mm">6<-1 6<90%</str> <int name="qs">1</int> <int name="ps">0</int>