I also tend to use "sentinel tokens" for exact match or to anchor a search. But 
in order to obtain decaying boost the further down in the article a match is, 
you'd need to write several such span/slop queries with varying slops, e.g. 
highest boost for first 10 words, medium boost for first 50 words, low boost 
for first 150 words, no boost below that.

As I wrote in my initial mail, we can do such workarounds, or play with 
payloads etc. But my real question is whether/how it is possible to factor the 
actual term offset information from a matching term into the scoring algorithm? 
Would you need to implement your own Scorer/Weight impl?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 29. aug. 2018 kl. 15:37 skrev Doug Turnbull 
> <dturnb...@opensourceconnections.com>:
> 
> You can also insert a token at the beginning of the query during analysis
> using a char filter. I call these sort of boundary tokens "sentinel
> tokens". So a phrase search for "red shoes" becomes "<SENT_BEG> red shoes".
> You can add some slop to allow for permissible distance (with
> 
> You can also use the Limit Token Count Token Filter and create a copyField,
> so if you want to boost on first 10 matches, just limit to 10 tokens then
> use this as a boost query
> https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LimitTokenCountFilter
> 
> -Doug
> 
> On Wed, Aug 29, 2018 at 6:26 AM Mikhail Khludnev <m...@apache.org> wrote:
> 
>> <SpanFirst>
>> <
>> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-XMLQueryParser
>>> 
>> 
>> On Wed, Aug 29, 2018 at 1:19 PM Jan Høydahl <jan....@cominvent.com> wrote:
>> 
>>> Hi,
>>> 
>>> Is there an ootb way to boost term matches based on their position/offset
>>> inside a field, so that the term gets a higher score if it occurs in the
>>> befinning of the field and lower boost or a deboost if it occurs towards
>>> the end of a field?
>>> 
>>> I know that I could index the first part of the text in a new field and
>>> boost on that, but that is kind of "binary".
>>> I could also add the term offset as payload for every term and boost on
>>> that, but this should not be necessary since offset info is already part
>> of
>>> the index?
>>> 
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>> 
>>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> 
> -- 
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug

Reply via email to