I also tend to use "sentinel tokens" for exact match or to anchor a search. But in order to obtain decaying boost the further down in the article a match is, you'd need to write several such span/slop queries with varying slops, e.g. highest boost for first 10 words, medium boost for first 50 words, low boost for first 150 words, no boost below that.
As I wrote in my initial mail, we can do such workarounds, or play with payloads etc. But my real question is whether/how it is possible to factor the actual term offset information from a matching term into the scoring algorithm? Would you need to implement your own Scorer/Weight impl? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 29. aug. 2018 kl. 15:37 skrev Doug Turnbull > <dturnb...@opensourceconnections.com>: > > You can also insert a token at the beginning of the query during analysis > using a char filter. I call these sort of boundary tokens "sentinel > tokens". So a phrase search for "red shoes" becomes "<SENT_BEG> red shoes". > You can add some slop to allow for permissible distance (with > > You can also use the Limit Token Count Token Filter and create a copyField, > so if you want to boost on first 10 matches, just limit to 10 tokens then > use this as a boost query > https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LimitTokenCountFilter > > -Doug > > On Wed, Aug 29, 2018 at 6:26 AM Mikhail Khludnev <m...@apache.org> wrote: > >> <SpanFirst> >> < >> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-XMLQueryParser >>> >> >> On Wed, Aug 29, 2018 at 1:19 PM Jan Høydahl <jan....@cominvent.com> wrote: >> >>> Hi, >>> >>> Is there an ootb way to boost term matches based on their position/offset >>> inside a field, so that the term gets a higher score if it occurs in the >>> befinning of the field and lower boost or a deboost if it occurs towards >>> the end of a field? >>> >>> I know that I could index the first part of the text in a new field and >>> boost on that, but that is kind of "binary". >>> I could also add the term offset as payload for every term and boost on >>> that, but this should not be necessary since offset info is already part >> of >>> the index? >>> >>> -- >>> Jan Høydahl, search solution architect >>> Cominvent AS - www.cominvent.com >>> >>> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> > -- > CTO, OpenSource Connections > Author, Relevant Search > http://o19s.com/doug