http://splainer.io/ <http://splainer.io/> from the gents at 
OpenSourceConnections is pretty good for this sort of thing, I find…

Alan Woodward
www.flax.co.uk


> On 13 Jan 2017, at 16:35, Tom Chiverton <t...@extravision.com> wrote:
> 
> Well, I've tried much larger values than 8, and it still doesn't seem to do 
> the job ?
> 
> For now, assume my users are searching for exact sub strings of a real title.
> 
> Tom
> 
> 
> On 13/01/17 16:22, Walter Underwood wrote:
>> I use a boost of 8 for title with no boost on the content. Both Infoseek and 
>> Inktomi settled on the 8X boost, getting there with completely different 
>> methodologies.
>> 
>> You might not want the title to completely trump the content. That causes 
>> some odd anomalies. If someone searches for “ice age 2”, do you really want 
>> every title with “2” to come before “ice age two”? Or a search for “steve 
>> jobs” to return every article with “job” or “jobs” in the title first?
>> 
>> Also, use “edismax”, not “dismax”. Dismax was obsolete in Solr 3.x, five 
>> years ago.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jan 13, 2017, at 7:10 AM, Tom Chiverton <t...@extravision.com> wrote:
>>> 
>>> I have a few hundred documents with title and content fields.
>>> 
>>> I want a match in title to trump matches in content. If I search for 
>>> "connected vehicle" then a news article that has that in the content 
>>> shouldn't be ranked higher than the page with that in the title is 
>>> essentially what I want.
>>> 
>>> I have tried dismax with qf=title^2 as well as several other variants with 
>>> the standard query parser (like q="title:"foo"^2 OR content:"foo") but 
>>> documents without the search term in the title still come out before those 
>>> with the term in the title when ordered by score.
>>> 
>>> Is there something I am missing ?
>>> 
>>> From the docs, something like q=title:"connected vehicle"^2 OR 
>>> content:"connected vehicle" should have worked ? Even using ^100 didn't 
>>> help.
>>> 
>>> I tried with the dismax parser using
>>> 
>>>       "q": "Connected Vehicle",
>>>       "defType": "dismax",
>>>       "indent": "true",
>>>       "qf": "title^2000 content",
>>>       "pf": "pf=title^4000 content^2",
>>>       "sort": "score desc",
>>>       "wt": "json",
>>> 
>>> but that was not better. if I remove content from pf/qf then documents seem 
>>> to rank correctly.
>>> Example query and results (content omitted) : http://pastebin.com/5EhrRJP8 
>>> <http://pastebin.com/5EhrRJP8> with managed-schema 
>>> http://pastebin.com/mdraWQWE <http://pastebin.com/mdraWQWE>
>>> 
>>> -- 
>>> <spacer.gif>
>>> <spacer.gif>
>>> <spacer.gif>
>>> Tom Chiverton
>>> Lead Developer
>>> <spacer.gif>
>>> e:   <mailto:t...@extravision.com>t...@extravision.com 
>>> <mailto:t...@extravision.com>
>>> p:  0161 817 2922
>>> t:  @extravision <http://www.twitter.com/extravision>
>>> w:   <http://www.extravision.com/>www.extravision.com 
>>> <http://www.extravision.com/>
>>> <spacer.gif>
>>> <outlook-logo.gif> <http://www.extravision.com/>
>>> <spacer.gif>
>>> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester, 
>>> M15 4LD.
>>> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
>>> 
>>> This e-mail is intended solely for the person to whom it is addressed and 
>>> may contain confidential or privileged information.
>>> Any views or opinions presented in this e-mail are solely of the author and 
>>> do not necessarily represent those of Extravision Ltd.
>>> <spacer.gif>
>> 
>> ______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> ______________________________________________________________________
> 

Reply via email to