1) i would strongly advise you against falling in the trap of thinking 
things like "Wiki posts should always be returned higher than blog posts" 
... unless you truly want *any* wiki post that matches your keywords, no 
matter how tangentially and how poorly, to come back "higher" on the list 
of results that any blog post -- evne if that blog post is 100% dedicated 
to the keywords the user searched for.

if that's really want you want, then all you need is "sort=doc_type desc, 
score desc" where you assign a numeric doct_type value at index type -- 
but i assure you, it's a terrible idea.

2) in general, what you are interesting in is "domain boosting" ... where
because of the specifics of your domain knowledge, you know that certain  
documents should generally score higher -- how much higher is an art form, 
that again is going to largely dependon the specifics of your domain, but 
you will most likeley want it to be something you can tweak and tune.

3) regardless of the specifics of the website you are dealing with, and 
the URL structure used, what really matters is how you convert the raw 
data on your website into documents to be indexed -- when you do that, 
however you do that, is when you can add fields to your documents to 
convey information like "this document is from the wiki" or "this document 
is from the forum" or "this doument is a verified forum answer".  If the 
only way you can conceptually know this information is by parsing the URL, 
then so be it -- but more then likeley if you are reading this data 
directly from an authoritative source (instead of just crawling URLs), 
there are easy methods to determine this stuff.

        . . .

My initial suggestion would be to create a simple field called 
"doc_type" containing values like "wiki", "blog", "forum", 
"forum_verified", and "forum_suggested" ... with those values *indexed* 
for each doc, you can then use the ExternalFileField to associate a 
numeric value to each of those special values, and you can tune & tweak 
those numeric values w/o re-indexing.  Then you should look into how boost 
functions work to make those numeric values an input into the final score 
calculations.  

In the long run hwoever, you may want ot consider indexing a general
"importance" value for each doc that you re-compute periodically based not 
just on the *type* of the document, but also things like the number of 
page views, the number of votes for forum answers to be "verified", etc...


More information about "domain boosting"...

https://people.apache.org/~hossman/ac2012eu/
http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630





On Fri, 6 Dec 2013, Jim Glynn wrote:

: Date: Fri, 6 Dec 2013 13:10:59 -0800 (PST)
: From: Jim Glynn <jrgl...@hotmail.com>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Prioritize search returns by URL path?
: 
: Thanks all. Yes, we can differentiate between content types by URL.
: Everything else being equal, Wiki posts should always be returned higher
: than blog posts, and blog posts should always be returned higher than forum
: posts.
: 
: Within forum posts, we want to rank Verified answered and Suggested answered
: posts higher than unanswered posts. These cannot be identified via path -
: only via metadata attached to the individual post. Any suggestions?
: 
: @Alex, I'll investigate the references you provided. Thanks!
: 
: 
: 
: --
: View this message in context: 
http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4105426.html
: Sent from the Solr - User mailing list archive at Nabble.com.
: 

-Hoss
http://www.lucidworks.com/

Reply via email to