Boosting relevance as terms get nearer to each other

Michael _ Thu, 13 Aug 2009 09:01:56 -0700

Hello,
I'd like to score documents higher that have the user's search terms nearer
each other.  For example, if a user searches for


  a AND b AND c

the standard query handler should return all documents with [a] [b] and [c]
in them, but documents matching the phrase "a b c" should get a boost over
those with "a x b c" over those with "b x y c z a", etc.

To accomplish this, I thought I might replace the user's query with

  "a b c"~1000000000

hoping that the slop term gets a higher and higher score the closer together
[a] [b] and [c] appear.  This doesn't seem to be the case in my experiments;
when I debug the query, there's no component of the score based on how close
together [a] [b] and [c] are.  And I'm suspicious that this would make my
queries a whole lot slower -- in reality my users' queries get expanded
quite a bit already, and I'd thus need to add many slop terms.

Perhaps instead I could modify the Standard query handler to examine the
distance between all ANDed tokens, and boost proportionally to the inverse
of their average distance apart.  I've never modified a query handler before
so I have no idea if this is possible.

Any suggestions on what approach I should take?  The less I have to modify
Solr, the better -- I'd prefer a query-side solution over writing a plugin
over forking the standard query handler.

Thanks in advance!
Michael

Boosting relevance as terms get nearer to each other

Reply via email to