TLDR; How should I make Solr treat "ns1.define.logica.com" as a single token in
the same way "ns.define.logica.com" would be?
We are just starting to use Solr 3.5.0 in production and have run into a
slightly surprising behaviour involving the query "ns1.define.logica.com",
through an edismax handler with "q.op"=AND defined with
<requestHandler name="search" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<!-- #define customisations -->
<str name="defType">edismax</str>
<str name="q.op">AND</str>
<str name="qf">
body^0.5 comments^0.4 tags^1.2 title^2.0 involved^1.5 id^10.0
author^10.9 changed created oneline^0.7
</str>
<str name="pf">
body^0.2 tags^1.1 title^1.5
</str>
</lst>
</requestHandler>
The schema is defined with fields of type text_general, as found in the example
schema.xml, namely:
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
The search string is being tokenised to "ns2", "define.logica.com", and the
resulting query becomes
+DisjunctionMaxQuery((((tags:ns1 tags:define.logica.com)^1.2) |
id:ns1.define.logica.com^10.0 | ((body:ns1 body:define.logica.com)^0.5) |
((author:ns1 author:define.logica.com)^10.9) | ((oneline:ns1
oneline:define.logica.com)^0.7) | ((title:ns1 title:define.logica.com)^2.0) |
((involved:ns1 involved:define.logica.com)^1.5) | ((comments:ns1
comments:define.logica.com)^0.4))) DisjunctionMaxQuery((tags:"ns1
define.logica.com"^1.1 | body:"ns1 define.logica.com"^0.2 | title:"ns1
define.logica.com"^1.5))
meaning that documents containing "ns1" OR "define.logica.com" are returned.
This is contrary to e.g. "ns.logica.define.com" which is treated as a single
token. Is there a way I can make Solr treat both queries the same way?
Many thanks, Alex
--
Alex Willmer | Developer
2 Trinity Park, Birmingham, B37 7ES | United Kingdom
M: +44 7557 752744
[email protected] | www.logica.com
Logica UK Ltd, registered in UK (registered number 947968)
Registered Office: 250 Brook Drive, Green Park, Reading RG2 6UA, United Kingdom