Thanks, that explains why the individual terms 'chicken' and 'stock' are still in the query (and are required). So I have tried a few things to get around this, but to no avail:
Changed the query analyzer to use the WhitespaceTokenizerFactory with autoGeneratePhraseQueries=true. This creates the correct phrase query, but the dismax query still requires the individual terms to match ('chicken' and 'stock'): +(DisjunctionMaxQuery((ingredient_synonyms:chicken)~0.01) DisjunctionMaxQuery((ingredient_synonyms:stock)~0.01)) DisjunctionMaxQuery((ingredient_synonyms:"chicken stock"~100)~0.01) So the next thing I have tried is to remove the individual terms during the query analysis. I did this using the ShingleFilterFactory, so my query analyzer now looks like this: <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="solr.ShingleFilterFactory" outputUnigrams="false" maxShingleSize="2" /> </analyzer> This leaves the single term 'chicken stock' in the query analysis and the dismax query is: +() DisjunctionMaxQuery((ingredient_synonyms:chicken stock)~0.01) Which looks OK except for the +(). It looks like it is requiring an empty clause. This seems like a pretty simple requirement - to only have exact matches on multi word text. Am I missing something here? Thanks Zac -----Original Message----- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Friday, February 10, 2012 1:50 AM To: solr-user@lucene.apache.org Subject: RE: Keyword Tokenizer Phrase Issue Hi Zac, Field Analysis tool (analysis.jsp) does not perform actual query parsing. One thing to be aware of when Using Keyword Tokenizer at query time is: Query string (chicken stock) is pre-tokenized according to white spaces, before it reaches keyword tokenizer. If you use quotes ("chicken stock"), query parser does no pre-tokenizes, though. --- On Fri, 2/10/12, Zac Smith <z...@trinkit.com> wrote: > From: Zac Smith <z...@trinkit.com> > Subject: RE: Keyword Tokenizer Phrase Issue > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Date: Friday, February 10, 2012, 10:35 AM I have done some further > analysis on this and I am now even more confused. When I use the Field > Analysis tool with the text 'chicken stock' it highlights that text as > a match. > The dismax query looks ok to me: > +(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01) > DisjunctionMaxQuery((ingredient_synonyms:stock^0.6)~0.01)) > DisjunctionMaxQuery((ingredient_synonyms:chicken > stock^0.6)~0.01) > > Then I have done an explainOther and it shows a failure to meet > condition. However there does seem to be some kind of match > registered: > 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited > clause(s) > 0.0 = no match on required clause > (ingredient_synonyms:chicken^0.6 > ingredient_synonyms:stock^0.6) > 0.0650662 = (MATCH) > weight(ingredient_synonyms:chicken stock^0.6 in 0), product > of: > 0.21204369 = > queryWeight(ingredient_synonyms:chicken stock^0.6), product > of: > 0.6 = boost > 0.30685282 = idf(docFreq=1, maxDocs=1) > 1.1517122 = queryNorm > 0.30685282 = (MATCH) > fieldWeight(ingredient_synonyms:chicken stock in 0), product > of: > 1.0 = > tf(termFreq(ingredient_synonyms:chicken stock)=1) > 0.30685282 = idf(docFreq=1, maxDocs=1) > 1.0 = > fieldNorm(field=ingredient_synonyms, doc=0) > > Any ideas? > > My dismax handler is setup like this: > <requestHandler name="dismax" > class="solr.SearchHandler" > > <lst name="defaults"> > <str > name="defType">dismax</str> > <str > name="echoParams">explicit</str> > <float > name="tie">0.01</float> > <str > name="qf">ingredient_synonyms^0.6</str> > <str > name="pf">ingredient_synonyms^0.6</str> > </requestHandler> > > Zac > > From: Zac Smith > Sent: Thursday, February 09, 2012 12:52 PM > To: solr-user@lucene.apache.org > Subject: Keyword Tokenizer Phrase Issue > > Hi, > > I have a simple field type that uses the KeywordTokenizerFactory. I > would like to use this so that values in this field are only matched > with the full text of the field. > e.g. If I indexed the text 'chicken stock', searches on this field > would only match when searching for 'chicken stock'. > If searching for just 'chicken' or just 'stock' there should not > match. > > This mostly works, except if there is more than one word in the text I > only get a match when searching with quotes. > e.g. > "chicken stock" (matches) > chicken stock (doesn't match) > > Is there any way I can set this up so that I don't have to provide > quotes? I am using dismax and if I put quotes in it will mess up the > search for the rest of my fields. I had an idea that I could issue a > separate search using the regular query parser, but couldn't work out > how to do this: > I thought I could do something like this: > qt=dismax&q=fish OR _query_:ingredient:"chicken stock" > > I am using solr 3.5.0. My field type is: > <fieldType name="keyword_test" class="solr.TextField" > positionIncrementGap="100" > autoGeneratePhraseQueries="true"> > > <analyzer type="index"> > > > <tokenizer class="solr.KeywordTokenizerFactory" /> > > </analyzer> > > <analyzer type="query"> > > > <tokenizer class="solr.KeywordTokenizerFactory" /> > > </analyzer> > </fieldType> > > Thanks > Zac >