Thanks, that explains why the individual terms 'chicken' and 'stock' are still 
in the query (and are required).
So I have tried a few things to get around this, but to no avail:

Changed the query analyzer to use the WhitespaceTokenizerFactory with 
autoGeneratePhraseQueries=true. This creates the correct phrase query, but the 
dismax query still requires the individual terms to match ('chicken' and 
'stock'):
+(DisjunctionMaxQuery((ingredient_synonyms:chicken)~0.01) 
DisjunctionMaxQuery((ingredient_synonyms:stock)~0.01)) 
DisjunctionMaxQuery((ingredient_synonyms:"chicken stock"~100)~0.01)

So the next thing I have tried is to remove the individual terms during the 
query analysis. I did this using the ShingleFilterFactory, so my query analyzer 
now looks like this:
<analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory" />                   
        
        <filter class="solr.ShingleFilterFactory" outputUnigrams="false" 
maxShingleSize="2" />
</analyzer>
This leaves the single term 'chicken stock' in the query analysis and the 
dismax query is:
+() DisjunctionMaxQuery((ingredient_synonyms:chicken stock)~0.01)

Which looks OK except for the +(). It looks like it is requiring an empty 
clause.

This seems like a pretty simple requirement - to only have exact matches on 
multi word text. Am I missing something here?

Thanks
Zac


-----Original Message-----
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Friday, February 10, 2012 1:50 AM
To: solr-user@lucene.apache.org
Subject: RE: Keyword Tokenizer Phrase Issue

Hi Zac,

Field Analysis tool (analysis.jsp) does not perform actual query parsing.

One thing to be aware of when Using Keyword Tokenizer at query time is: Query 
string (chicken stock) is pre-tokenized according to white spaces, before it 
reaches keyword tokenizer.

If you use quotes ("chicken stock"), query parser does no pre-tokenizes, though.

--- On Fri, 2/10/12, Zac Smith <z...@trinkit.com> wrote:

> From: Zac Smith <z...@trinkit.com>
> Subject: RE: Keyword Tokenizer Phrase Issue
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Date: Friday, February 10, 2012, 10:35 AM I have done some further 
> analysis on this and I am now even more confused. When I use the Field 
> Analysis tool with the text 'chicken stock' it highlights that text as 
> a match.
> The dismax query looks ok to me:
> +(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01)
> DisjunctionMaxQuery((ingredient_synonyms:stock^0.6)~0.01))
> DisjunctionMaxQuery((ingredient_synonyms:chicken
> stock^0.6)~0.01)
> 
> Then I have done an explainOther and it shows a failure to meet 
> condition. However there does seem to be some kind of match 
> registered:
> 0.0 = (NON-MATCH) Failure to meet condition(s) of required/prohibited 
> clause(s)
>   0.0 = no match on required clause
> (ingredient_synonyms:chicken^0.6
> ingredient_synonyms:stock^0.6)
>   0.0650662 = (MATCH)
> weight(ingredient_synonyms:chicken stock^0.6 in 0), product
> of:
>     0.21204369 =
> queryWeight(ingredient_synonyms:chicken stock^0.6), product
> of:
>       0.6 = boost
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>       1.1517122 = queryNorm
>     0.30685282 = (MATCH)
> fieldWeight(ingredient_synonyms:chicken stock in 0), product
> of:
>       1.0 =
> tf(termFreq(ingredient_synonyms:chicken stock)=1)
>       0.30685282 = idf(docFreq=1, maxDocs=1)
>       1.0 =
> fieldNorm(field=ingredient_synonyms, doc=0)
> 
> Any ideas?
> 
> My dismax handler is setup like this:
>   <requestHandler name="dismax"
> class="solr.SearchHandler" >
>     <lst name="defaults">
>      <str
> name="defType">dismax</str>
>      <str
> name="echoParams">explicit</str>
>      <float
> name="tie">0.01</float>
>      <str
> name="qf">ingredient_synonyms^0.6</str>
>      <str
> name="pf">ingredient_synonyms^0.6</str>
> </requestHandler>
> 
> Zac
> 
> From: Zac Smith
> Sent: Thursday, February 09, 2012 12:52 PM
> To: solr-user@lucene.apache.org
> Subject: Keyword Tokenizer Phrase Issue
> 
> Hi,
> 
> I have a simple field type that uses the KeywordTokenizerFactory. I 
> would like to use this so that values in this field are only matched 
> with the full text of the field.
> e.g. If I indexed the text 'chicken stock', searches on this field 
> would only match when searching for 'chicken stock'.
> If searching for just 'chicken' or just 'stock' there should not 
> match.
> 
> This mostly works, except if there is more than one word in the text I 
> only get a match when searching with quotes.
> e.g.
> "chicken stock" (matches)
> chicken stock (doesn't match)
> 
> Is there any way I can set this up so that I don't have to provide 
> quotes? I am using dismax and if I put quotes in it will mess up the 
> search for the rest of my fields. I had an idea that I could issue a 
> separate search using the regular query parser, but couldn't work out 
> how to do this:
> I thought I could do something like this:
> qt=dismax&q=fish OR _query_:ingredient:"chicken stock"
> 
> I am using solr 3.5.0. My field type is:
> <fieldType name="keyword_test" class="solr.TextField"
> positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>                
> <analyzer type="index">
>                
>                
> <tokenizer class="solr.KeywordTokenizerFactory" />
>                
> </analyzer>
>                
> <analyzer type="query">
>                
>                
> <tokenizer class="solr.KeywordTokenizerFactory" />
>                
> </analyzer>
> </fieldType>
> 
> Thanks
> Zac
> 


Reply via email to