The query may be the same, but your analyzers are radically different.

Just a hunch, but maybe GosenTokenizerFactory is treating the "." as a space. In 1.4 you were using SenTokenizerFactory. Or maybe GosenBasicFormFilterFactory is treating the "." as a space. In any case, my hunch is that "test.pdf" gets to WDF as two separate tokens, which is the query that is generated on 3.6.

To debug, remove the filters starting with WDF and see if the "." was still there before WDF has invoked. No need to reindex, just reload Solr and look at the parsed query for test.pdf .

-- Jack Krupansky

-----Original Message----- From: Katsuyoshi NOGUCHI
Sent: Wednesday, May 16, 2012 6:03 AM
To: solr-user@lucene.apache.org
Subject: Dismax query results vary on Solr1.4 and 3.6.

Hi, guys! I need some advice.

When sending the same dismax query to Solr 1.4 and 3.6,
query results of search words analized by WordDelimiterFilterFactory are
different as below:

[Search Word]
test.pdf

[Result]
Solr1.4: Search results are analized by "test" AND "pdf"
Solr3.6: Search results are analized by "test" OR "pdf"

In Solr3.6, how can I recieve the same result of "test" AND "pdf" as in
Solr 1.4?

[Japanese Analizer]
Solr1.4 -> Sen
Solr3.6 -> lucene-gosen


Here are some examples of debug results in solrAdmin:
/*solrAdmin debug result-1.4*/
<lst name="debug">
<str name="rawquerystring">test.pdf</str>
<str name="querystring">test.pdf</str>
<str name="parsedquery">
+DisjunctionMaxQuery((fcontent_tsn_is:"test pdf" | fname_tbg_is:"test
pdf")) ()
</str>
<str name="parsedquery_toString">
+(fcontent_tsn_is:"test pdf" | fname_tbg_is:"test pdf") ()
</str>
…
<str name="QParser">DisMaxQParser</str>
…
</lst>

/*solrAdmin debug result-3.6*/
<lst name="debug">
<str name="rawquerystring">test.pdf</str>
<str name="querystring">test.pdf</str>
<str name="parsedquery">
+DisjunctionMaxQuery(((fcontent_tsn_is:test fcontent_tsn_is:pdf) |
(fname_tbg_is:test fname_tbg_is:pdf)))
</str>
<str name="parsedquery_toString">
+((fcontent_tsn_is:test fcontent_tsn_is:pdf) | (fname_tbg_is:test
fname_tbg_is:pdf))
</str>
...
<str name="QParser">ExtendedDismaxQParser</str>
…
</lst>


The followings are request handlers used in Solr1.4/3.6:

/*solrconfig.xml-1.4*/
<requestHandler name="dismax" class="solr.SearchHandler" >
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<str name="q.alt">*:*</str>
<str name="qf">fcontent_tsn_is^1.0 fname_tbg_is^1.0 </str>
</lst>
</requestHandler>

/*solrconfig.xml-3.6*/
<requestHandler name="dismax" class="solr.SearchHandler" >
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<str name="q.alt">*:*</str>
<str name="qf">content_tsn_is^1.0 name_tbg_is^1.0</str>
</lst>
</requestHandler>


The followings are schemas used in Solr1.4/3.6:
/*schema.xml-1.4*/
<fieldType name="text_sen" class="solr.TextField">
<analyzer>
<tokenizer class="solrbook.analysis.SenTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
tokenizerFactory="solrbook.analysis.SenTokenizerFactory" ignoreCase="true"
expand="true"/>
</analyzer>
</fieldType>

<fields>
<dynamicField name="*_tsn_is"    type="text_sen"   indexed="true"
stored="true"   compressed="false" termVectors="true" termPositions="true"
termOffsets="true"  />
<dynamicField name="*_tbg_is"    type="text_bigram"   indexed="true"
stored="true"   compressed="false" termVectors="true" termPositions="true"
termOffsets="true"  />
</fields>

<solrQueryParser defaultOperator="AND"/>

/*schema.xml-3.6*/
<fieldType name="text_sen" class="solr.TextField">
<analyzer>
<charFilter class="solr.MappingCharFilterFactory" mapping="ja-mapping.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<tokenizer class="solr.GosenTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.GosenBasicFormFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldType>

<fields>
<dynamicField name="*_tsn_is"    type="text_sen"   indexed="true"
stored="true"   compressed="false" termVectors="true" termPositions="true"
termOffsets="true"  />
<dynamicField name="*_tbg_is"    type="text_bigram"   indexed="true"
stored="true"   compressed="false" termVectors="true" termPositions="true"
termOffsets="true"  />
</fields>

<solrQueryParser defaultOperator="AND"/>


Regards.

Reply via email to