RE: Can't get phrase field boosting to work using edismax

jimi.hullegard Wed, 06 Apr 2016 02:27:58 -0700

OK, well I'm not sure I agree with you. First of all, you ask me to point my 
"pf" towards a tokenized field, but I already do that (the fact that all text 
is tokenized into a single token doesn't change that fact). Also, I don't agree 
with the view that a single term phrase never is valid/reasonable. In this 
specific case, with a KeywordTokenizer, I see it as very reasonable indeed. And 
I would consider a "single term keyword phrase" solution more logical than a 
workaround using special magical characters inserted in the text. Just my two 
cents... :)


Oh, hang on... If a phrase is defined as multiple tokens, and pf is used for 
phrase  boosting, does that mean that even with a regular tokenizer the pf 
won't work for fields that only contain one word? For example if the title of 
one document is "John", and the user searches for 'John' (without any 
surrounding phrase-characters), will edismax not boost this document?

/Jimi

-----Original Message-----
From: Jan Høydahl [mailto:jan....@cominvent.com] 
Sent: Wednesday, April 6, 2016 10:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Can't get phrase field boosting to work using edismax

Hi,

Phrase match via “pf” requires the target field to contain a phrase. A phrase 
is defined as multiple tokens. Yours does not contain a phrase since you use 
the KeywordTokenizer, leaving only one token in the field. eDismax pf will thus 
never kick in. Please point your “pf” towards a tokenized field.

If what you are trying to achieve is to boost only when the whole query exactly 
matches the full content of the field, then have a look at my solution here 
https://github.com/cominvent/exactmatch

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 5. apr. 2016 kl. 19.10 skrev jimi.hulleg...@svensktnaringsliv.se:
> 
> Some more input, before I call it a day. Just for the heck of it, I tried 
> changing minClauseSize to 0 using the Eclipse debugger, so that it didn't 
> return null at line 1203, but instead returned the TermQuery on line 1205. 
> Then everything worked exactly as it should. The matching document got 
> boosted as expected. And in the explain output, this can be seen:
> 
> [...]
> 11.274228 = (MATCH) weight(exactTitle:some words^100.0 in 172) 
> [DefaultSimilarity], result of:
> [...]
> 
> So. In my case, having minClauseSize=2 on line 550 (line 565 for solr 5.5.0) 
> is the culprit. Is this a bug, or am I using the pf in the wrong way? Can 
> someone explain why minClauseSize can't be set to 0 here? The comment simply 
> states "we need at least two or there shouldn't be a boost", but no 
> explaination *why* at least two is needed.
> 
> Regards
> /Jimi
> 
> -----Original Message-----
> From: jimi.hulleg...@svensktnaringsliv.se 
> [mailto:jimi.hulleg...@svensktnaringsliv.se]
> Sent: Tuesday, April 5, 2016 6:51 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Can't get phrase field boosting to work using edismax
> 
> I now used the Eclipse debugger, to try and see if I can understand what is 
> happening, I it seems like the ExtendedDismaxQParser simply ignores my pf 
> parameter, since it doesn't interpret it as a phrase query.
> 
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.6.0/
> solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java
> 
> On line 1180 I get a query object of type TermQuery (with the term 
> "exactTitle:some words"). And in the if statements starting at line it is 
> quite clear that if it is not a PhraseQuery or a MultiPhraseQuery, or if the 
> minClauseSize > 1 (and it is set to 2 on line 550) the method simply returns 
> null (ie ignoring my pf parameter). Why is this happening?
> 
> I use Solr 4.6 by the way... I forgot to mention that in my original message.
> 
> 
> -----Original Message-----
> From: jimi.hulleg...@svensktnaringsliv.se 
> [mailto:jimi.hulleg...@svensktnaringsliv.se]
> Sent: Tuesday, April 5, 2016 5:36 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Can't get phrase field boosting to work using edismax
> 
> OK. Interesting. But... I added a solr.TrimFilterFactory at the end of my 
> analyzer definition. Shouldn't that take care of the added space at the end? 
> The admin analysis page indicates that it works as it should, but I still 
> can't get edismax to boost.
> 
> -----Original Message-----
> From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
> Sent: Tuesday, April 5, 2016 4:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Can't get phrase field boosting to work using edismax
> 
> It looks like the code constructing the boost phrase for pf will always add a 
> trailing blank, which is never a problem when a normal tokenizer is used that 
> removes white space, but the keyword tokenizer will preserve that extra 
> space, which prevents an exact match.
> 
> See line 531:
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/5.5.0/
> solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java
> 
> I'd say it's a bug, but more a narrow use case that wasn't considered or 
> tested.
> 
> -- Jack Krupansky
> 
> On Tue, Apr 5, 2016 at 7:50 AM, <jimi.hulleg...@svensktnaringsliv.se> wrote:
> 
>> Hi,
>> 
>> I'm trying to boost documents using a phrase field boosting (ie the 
>> pf parameter for edismax), but I can't get it to work (ie boosting 
>> documents where the pf field match the query as a phrase).
>> 
>> As far as I can tell, solr, or more specifically the edismax handler, 
>> does
>> *something* when I add this parameter. I know this because the QTime 
>> increases from around 5-10ms to around 30-40 ms, and the score 
>> explain structure is *slightly* modified (though with the same final 
>> score for all documents). But nowhere in the explain structure can I 
>> see anything about the pf. And I can't understand that. Shouldn't it 
>> be included in the explain? If not, is there any way to force it to be 
>> included somehow?
>> 
>> The query looks something like this:
>> 
>> 
>> ?q=some+words&rows=10&sort=score+desc&debugQuery=true&fl=objectid,exa
>> c
>> tTitle,score%2C%5Bexplain+style%3Dtext%5D&qf=title%5E2&qf=swedishText
>> 1 %5E1&defType=edismax&pf=exactTitle%5E5&wt=xml&indent=true
>> 
>> 
>> I have one document that has the title "some words", and when I do a 
>> simple query filter with exactTitle:"some words" I get a match for 
>> that document. So then I would expect that the query above would 
>> boost this document, and include information about this in the 
>> explain. But nothing like this happens, and I can't understand why.
>> 
>> The field looks like this:
>> 
>> <field name="exactTitle" type="keywordText" indexed="true" stored="true"
>> required="false" multiValued="false" />
>> 
>> And the fieldType looks like this:
>> 
>> <fieldType name="keywordText" class="solr.TextField"
>> positionIncrementGap="100">
>>                         <analyzer>
>>                                                  <charFilter 
>> class="solr.HTMLStripCharFilterFactory" />
>>                                                  <tokenizer 
>> class="solr.KeywordTokenizerFactory" />
>>                                                  <filter 
>> class="solr.LowerCaseFilterFactory" />
>>                         </analyzer>
>> </fieldType>
>> 
>> 
>> I have also tried boosting this document using a boost query, ie 
>> bq=exactTitle:"some words", and this works as expected. The document 
>> score is boosted, and the explain states this very clearly, with this 
>> segment:
>> 
>> [...]
>> 9.870669 = (MATCH) weight(exactTitle:some words^5.0 in 12) 
>> [DefaultSimilarity], result of:
>> [...]
>> 
>> Why is this working, but q=some+words&pf=exactTitle^5 not? Shouldn't 
>> edismax rewrite my "pf query" into something very similar to the "bq query"?
>> 
>> Regards
>> /Jimi
>>

RE: Can't get phrase field boosting to work using edismax

Reply via email to