I have come to the conclusion that this isn't possible due to the way dismax queries are created. I found someone else that had the exact same issue last year: http://lucene.472066.n3.nabble.com/Multi-word-exact-keyword-case-insensitive-search-suggestions-td2246516.html I believe this makes it impossible to do exact matching on multi word terms with dismax.
So I have created two JIRA tickets that hopefully address the issue: 1) a suggested improvement to dismax specific to the KeywordTokenizerFactory: https://issues.apache.org/jira/browse/SOLR-3127 2) what I believe is a bug when removing terms from the query: https://issues.apache.org/jira/browse/SOLR-3128 Feedback welcome. Thanks Zac -----Original Message----- From: Zac Smith Sent: Friday, February 10, 2012 3:30 PM To: 'solr-user@lucene.apache.org' Subject: RE: Keyword Tokenizer Phrase Issue Thanks, that explains why the individual terms 'chicken' and 'stock' are still in the query (and are required). So I have tried a few things to get around this, but to no avail: Changed the query analyzer to use the WhitespaceTokenizerFactory with autoGeneratePhraseQueries=true. This creates the correct phrase query, but the dismax query still requires the individual terms to match ('chicken' and 'stock'): +(DisjunctionMaxQuery((ingredient_synonyms:chicken)~0.01) +DisjunctionMaxQuery((ingredient_synonyms:stock)~0.01)) +DisjunctionMaxQuery((ingredient_synonyms:"chicken stock"~100)~0.01) So the next thing I have tried is to remove the individual terms during the query analysis. I did this using the ShingleFilterFactory, so my query analyzer now looks like this: <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.ShingleFilterFactory" outputUnigrams="false" maxShingleSize="2" /> </analyzer> This leaves the single term 'chicken stock' in the query analysis and the dismax query is: +() DisjunctionMaxQuery((ingredient_synonyms:chicken stock)~0.01) Which looks OK except for the +(). It looks like it is requiring an empty clause. This seems like a pretty simple requirement - to only have exact matches on multi word text. Am I missing something here? Thanks Zac