Hello, I've got strange results when I have special characters in my query.
Here is my request : q=histoire-france&start=0&rows=10&sort=score+desc&defType=dismax&qf=any^1.0&mm=100% Parsed query : <str name="parsedquery_toString">+((any:histoir any:franc)) ()</str> I've got 17000 results because Solr is doing an OR (should be AND). I have no problem when I'm using a whitespace instead of a special char : q=histoire france&start=0&rows=10&sort=score+desc&defType=dismax&qf=any^1.0&mm=100% <str name="parsedquery_toString">+(((any:histoir) (any:franc))~2) ()</str> 2000 results for this query. Here is my schema.xml (relevant parts) : <fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.CommonGramsFilterFactory" words="stopwords_french.txt" ignoreCase="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_french.txt" enablePositionIncrements="true"/> <filter class="solr.SnowballPorterFilterFactory" language="French" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <!--<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>--> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.CommonGramsFilterFactory" words="stopwords_french.txt" ignoreCase="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_french.txt" enablePositionIncrements="true"/> <filter class="solr.SnowballPorterFilterFactory" language="French" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> </analyzer> </fieldType> I tried with a PatternTokenizerFactory to tokenize on whitespaces & special chars but no change... Even with a charFilter (PatternReplaceCharFilterFactory) to replace special characters by whitespace, it doesn't work... First line of analysis via solr admin, with verbose output, for query = 'histoire-france' : org.apache.solr.analysis.PatternReplaceCharFilterFactory {replacement= , pattern=([,;./\\'&-]), luceneMatchVersion=LUCENE_32} text histoire france The '-' is replaced by ' ', then tokenized by WhitespaceTokenizerFactory. However I still have different number of results for 'histoire-france' and 'histoire france'. My current workaround is to replace all special chars by whitespaces before sending query to Solr, but it is not satisfying. Did i miss something ?