Yes, the mm is 100%. Thank you for a detailed answer.
Regards!
Dalius Sidlauskas
On 21/08/12 15:21, Jack Krupansky wrote:
Solr doesn't actually "know" any natural language, so it has no way of
assessing whether two token streams "have the same meaning." In your
case, the surface forms/syntax are subtly different - two separate
terms vs. a single source term with embedded punctuation.
It appears that you are probbaly using the edismax query parser and
probably have "mm" set to "100%" or "q.op" set to "AND" (the "~2"
indicates a BooleanQuery with minMatch of 2 terms.) "mm" of 100%" is
equivalent to the "AND" operator, some/most of the time.
For the second query you have a "split-term" which is treated as a
single term/token until the fieldType analyzer splits it into two
terms and then does an "OR" of the sub-terms. Unfortunately, "mm" and
"q.op" are not passed down to the analyzer, so you have no way of
changing that "OR" to an "AND" - this is why you get different
results. But what you can do is set "autoGeneratePhraseQueries="true""
on your field type(s) to cause the query parser to generate a phrase
query for "q osona" rather than the "OR". That's not the same as
"AND", but depending on your application it may be sufficient or even
preferable.
-- Jack Krupansky
-----Original Message----- From: Dalius Sidlauskas
Sent: Tuesday, August 21, 2012 9:35 AM
To: solr-user@lucene.apache.org
Subject: Different queries for same meaning searches
Hello, here is my index and index analyzer configuration:
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="’|'"
replacement=" "/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ICUFoldingFilterFactory"/>
Search for "d Osona" and "d’Osona" creates "d" and "osona" tokens. But
ParsedQuery is different:
#1 "d Osona"
+((
DisjunctionMaxQuery((search_definitions:d | search_title:d))
DisjunctionMaxQuery((search_definitions:osona | search_title:osona))
)~2)
DisjunctionMaxQuery((search_definitions:"d osona" | search_title:"d
osona"^3.0))
#2 "d’Osona"
+DisjunctionMaxQuery((
(search_definitions:d search_definitions:osona) |
(search_title:d search_title:osona)
))
DisjunctionMaxQuery((search_definitions:"d osona" | search_title:"d
osona"^3.0))
And the results are different as well. Where I can find explanation for
this?