I have come to the conclusion that this isn't possible due to the way dismax 
queries are created. I found someone else that had the exact same issue last 
year: 
http://lucene.472066.n3.nabble.com/Multi-word-exact-keyword-case-insensitive-search-suggestions-td2246516.html
I believe this makes it impossible to do exact matching on multi word terms 
with dismax.

So I have created two JIRA tickets that hopefully address the issue:
1) a suggested improvement to dismax specific to the KeywordTokenizerFactory: 
https://issues.apache.org/jira/browse/SOLR-3127
2) what I believe is a bug when removing terms from the query: 
https://issues.apache.org/jira/browse/SOLR-3128

Feedback welcome.

Thanks
Zac

-----Original Message-----
From: Zac Smith 
Sent: Friday, February 10, 2012 3:30 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Keyword Tokenizer Phrase Issue

Thanks, that explains why the individual terms 'chicken' and 'stock' are still 
in the query (and are required).
So I have tried a few things to get around this, but to no avail:

Changed the query analyzer to use the WhitespaceTokenizerFactory with 
autoGeneratePhraseQueries=true. This creates the correct phrase query, but the 
dismax query still requires the individual terms to match ('chicken' and 
'stock'):
+(DisjunctionMaxQuery((ingredient_synonyms:chicken)~0.01) 
+DisjunctionMaxQuery((ingredient_synonyms:stock)~0.01)) 
+DisjunctionMaxQuery((ingredient_synonyms:"chicken stock"~100)~0.01)

So the next thing I have tried is to remove the individual terms during the 
query analysis. I did this using the ShingleFilterFactory, so my query analyzer 
now looks like this:
<analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />                   
        
        <filter class="solr.ShingleFilterFactory" outputUnigrams="false" 
maxShingleSize="2" /> </analyzer> This leaves the single term 'chicken stock' 
in the query analysis and the dismax query is:
+() DisjunctionMaxQuery((ingredient_synonyms:chicken stock)~0.01)

Which looks OK except for the +(). It looks like it is requiring an empty 
clause.

This seems like a pretty simple requirement - to only have exact matches on 
multi word text. Am I missing something here?

Thanks
Zac

Reply via email to