Re: Can't get phrase field boosting to work using edismax

Jan Høydahl Wed, 06 Apr 2016 01:44:10 -0700

Hi,

Phrase match via “pf” requires the target field to contain a phrase. A phrase 
is defined as multiple tokens. Yours does not contain a phrase since you use 
the KeywordTokenizer, leaving only one token in the field. eDismax pf will thus 
never kick in. Please point your “pf” towards a tokenized field.


If what you are trying to achieve is to boost only when the whole query exactly 
matches the full content of the field, then have a look at my solution here 
https://github.com/cominvent/exactmatch

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 5. apr. 2016 kl. 19.10 skrev jimi.hulleg...@svensktnaringsliv.se:
> 
> Some more input, before I call it a day. Just for the heck of it, I tried 
> changing minClauseSize to 0 using the Eclipse debugger, so that it didn't 
> return null at line 1203, but instead returned the TermQuery on line 1205. 
> Then everything worked exactly as it should. The matching document got 
> boosted as expected. And in the explain output, this can be seen:
> 
> [...]
> 11.274228 = (MATCH) weight(exactTitle:some words^100.0 in 172) 
> [DefaultSimilarity], result of:
> [...]
> 
> So. In my case, having minClauseSize=2 on line 550 (line 565 for solr 5.5.0) 
> is the culprit. Is this a bug, or am I using the pf in the wrong way? Can 
> someone explain why minClauseSize can't be set to 0 here? The comment simply 
> states "we need at least two or there shouldn't be a boost", but no 
> explaination *why* at least two is needed.
> 
> Regards
> /Jimi
> 
> -----Original Message-----
> From: jimi.hulleg...@svensktnaringsliv.se 
> [mailto:jimi.hulleg...@svensktnaringsliv.se] 
> Sent: Tuesday, April 5, 2016 6:51 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Can't get phrase field boosting to work using edismax
> 
> I now used the Eclipse debugger, to try and see if I can understand what is 
> happening, I it seems like the ExtendedDismaxQParser simply ignores my pf 
> parameter, since it doesn't interpret it as a phrase query.
> 
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.6.0/solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java
> 
> On line 1180 I get a query object of type TermQuery (with the term 
> "exactTitle:some words"). And in the if statements starting at line it is 
> quite clear that if it is not a PhraseQuery or a MultiPhraseQuery, or if the 
> minClauseSize > 1 (and it is set to 2 on line 550) the method simply returns 
> null (ie ignoring my pf parameter). Why is this happening?
> 
> I use Solr 4.6 by the way... I forgot to mention that in my original message.
> 
> 
> -----Original Message-----
> From: jimi.hulleg...@svensktnaringsliv.se 
> [mailto:jimi.hulleg...@svensktnaringsliv.se]
> Sent: Tuesday, April 5, 2016 5:36 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Can't get phrase field boosting to work using edismax
> 
> OK. Interesting. But... I added a solr.TrimFilterFactory at the end of my 
> analyzer definition. Shouldn't that take care of the added space at the end? 
> The admin analysis page indicates that it works as it should, but I still 
> can't get edismax to boost.
> 
> -----Original Message-----
> From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
> Sent: Tuesday, April 5, 2016 4:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can't get phrase field boosting to work using edismax
> 
> It looks like the code constructing the boost phrase for pf will always add a 
> trailing blank, which is never a problem when a normal tokenizer is used that 
> removes white space, but the keyword tokenizer will preserve that extra 
> space, which prevents an exact match.
> 
> See line 531:
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java
> 
> I'd say it's a bug, but more a narrow use case that wasn't considered or 
> tested.
> 
> -- Jack Krupansky
> 
> On Tue, Apr 5, 2016 at 7:50 AM, <jimi.hulleg...@svensktnaringsliv.se> wrote:
> 
>> Hi,
>> 
>> I'm trying to boost documents using a phrase field boosting (ie the pf 
>> parameter for edismax), but I can't get it to work (ie boosting 
>> documents where the pf field match the query as a phrase).
>> 
>> As far as I can tell, solr, or more specifically the edismax handler, 
>> does
>> *something* when I add this parameter. I know this because the QTime 
>> increases from around 5-10ms to around 30-40 ms, and the score explain 
>> structure is *slightly* modified (though with the same final score for 
>> all documents). But nowhere in the explain structure can I see 
>> anything about the pf. And I can't understand that. Shouldn't it be 
>> included in the explain? If not, is there any way to force it to be included 
>> somehow?
>> 
>> The query looks something like this:
>> 
>> 
>> ?q=some+words&rows=10&sort=score+desc&debugQuery=true&fl=objectid,exac
>> tTitle,score%2C%5Bexplain+style%3Dtext%5D&qf=title%5E2&qf=swedishText1
>> %5E1&defType=edismax&pf=exactTitle%5E5&wt=xml&indent=true
>> 
>> 
>> I have one document that has the title "some words", and when I do a 
>> simple query filter with exactTitle:"some words" I get a match for 
>> that document. So then I would expect that the query above would boost 
>> this document, and include information about this in the explain. But 
>> nothing like this happens, and I can't understand why.
>> 
>> The field looks like this:
>> 
>> <field name="exactTitle" type="keywordText" indexed="true" stored="true"
>> required="false" multiValued="false" />
>> 
>> And the fieldType looks like this:
>> 
>> <fieldType name="keywordText" class="solr.TextField"
>> positionIncrementGap="100">
>>                         <analyzer>
>>                                                  <charFilter 
>> class="solr.HTMLStripCharFilterFactory" />
>>                                                  <tokenizer 
>> class="solr.KeywordTokenizerFactory" />
>>                                                  <filter 
>> class="solr.LowerCaseFilterFactory" />
>>                         </analyzer>
>> </fieldType>
>> 
>> 
>> I have also tried boosting this document using a boost query, ie 
>> bq=exactTitle:"some words", and this works as expected. The document 
>> score is boosted, and the explain states this very clearly, with this 
>> segment:
>> 
>> [...]
>> 9.870669 = (MATCH) weight(exactTitle:some words^5.0 in 12) 
>> [DefaultSimilarity], result of:
>> [...]
>> 
>> Why is this working, but q=some+words&pf=exactTitle^5 not? Shouldn't 
>> edismax rewrite my "pf query" into something very similar to the "bq query"?
>> 
>> Regards
>> /Jimi
>>

Re: Can't get phrase field boosting to work using edismax

Reply via email to