Re: Index time boosts, payloads, and long query strings

Girish Redekar Mon, 23 Nov 2009 01:14:32 -0800

Thanks Erick!

After reading your answer, and re-reading the Solr wiki, I realized my
folly. I used to think that index-time boosts when applied on a per-field
basis are equivalent to query time boosts to that field.


To ensure that my new understanding is correct , I'll state it in my words.
Index time boosts will determine boost for a *document* if it is counted as
a hit. Query time boosts give you control on boosting the occurrence of a
query in a specific field.

Please correct me if I'm wrong (again) :-)

Girish Redekar
http://girishredekar.net


On Sun, Nov 22, 2009 at 8:25 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> I still think they are apples and oranges. If you boost *all* titles,
> you're effectively boosting none of them. Index time boosting
> expresses "this document's title is more important than other
> document titles." What I think you're after is "titles are more
> important than other parts of the document.
>
> For this latter, you're talking query-time boosting. Boosting only
> really makes sense if there are multiple clauses, something
> like title:important OR body:unimportant. If this is true, speed
> is irrelevant, you need correct behavior.
>
> Not that I think you'd notice either way. Modern computers
> can do a LOT of FLOPS/sec. Here's an experiment: time
> some queries (but beware of timing the very first ones, see
> the Wiki) with boosts and without boosts. I doubt you'll see
> enough difference to matter (but please do report back if you
> do, it'll further my education <G>).
>
> But, depending on your index structure, you may get this
> anyway. Generally, matches on shorter fields weigh more
> in the score calculations than on longer fields. If you have
> fields like title and body and you are querying on title:term OR
> body:term, documents with term in the title will tend toward
> higher scores.
>
> But before putting too much effort into this, do you have any
> evidence that the default behavior is unsatisfactory? Because
> unless and until you do, I think this is a distraction <G>...
>
> Best
> Erick
>
> On Sun, Nov 22, 2009 at 8:37 AM, Girish Redekar
> <girish.rede...@aplopio.com>wrote:
>
> > Hi Erick -
> >
> > Maybe I mis-wrote.
> >
> > My question is: would "title:any_query^4.0" be faster/slower than
> applying
> > index time boost to the field title. Basically, if I take *every* user
> > query
> > and search for it in title with boost (say, 4.0) - is it different than
> > saying field title has boost 4.0?
> >
> > Cheers,
> > Girish Redekar
> > http://girishredekar.net
> >
> >
> > On Sun, Nov 22, 2009 at 2:02 AM, Erick Erickson <erickerick...@gmail.com
> > >wrote:
> >
> > > I'll take a whack at index .vs. query boosting. They are expressing
> very
> > > different concepts. Let's claim we're interested in boosting the title
> > > field....
> > >
> > > Index time boosting is expressing "this document's title is X more
> > > important
> > >
> > > than a normal document title". It doesn't matter *what* the title is,
> > > any query that matches on anything in this document's title will give
> > this
> > > document a boost. I might use this to give preferential treatment to
> all
> > > encyclopedia entries or something.
> > >
> > > Query time boosting, like "title:solr^4.0" expresses "Any document with
> > > solr
> > > in
> > > it's title is more important than documents without solr in the title".
> > > This
> > > really
> > > only makes sense if you have other clauses that might cause a document
> > > *without*
> > > solr  the title to match......
> > >
> > > Since they are doing different things, efficiency isn't really
> relevant.
> > >
> > > HTH
> > > Erick
> > >
> > >
> > > On Sat, Nov 21, 2009 at 2:13 AM, Girish Redekar
> > > <girish.rede...@aplopio.com>wrote:
> > >
> > > > Hi ,
> > > >
> > > > I'm relatively new to Solr/Lucene, and am using Solr (and not lucene
> > > > directly) primarily because I can use it without writing java code
> > (rest
> > > of
> > > > my project is python coded).
> > > >
> > > > My application has the following requirements:
> > > > (a) ability to search over multiple fields, each with different
> weight
> > > > (b) If possible, I'd like to have the ability to add extra/diminished
> > > > weights to particular tokens within a field
> > > > (c) My query strings have large lengths (50-100 words)
> > > > (d) My index is 500K+  documents
> > > >
> > > > 1) The way to (a) is field boosting (right?). My question is: Is all
> > > field
> > > > boosting done at query time? Even if I give index time boosts to
> > fields?
> > > Is
> > > > there a performance advantage in boosting fields at index time vs at
> > > using
> > > > something like fieldname:querystring^boost.
> > > > 2) From what I've read, it seems that I can do (b) using payloads.
> > > However,
> > > > as this link (
> > > >
> > > >
> > >
> >
> http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
> > > > )
> > > > suggests, I will have to write a payload aware Query Parser. Wanted
> to
> > > > confirm if this is indeed the case - or is there a out-of-box way to
> > > > implement payloads (am using Solr1.4)
> > > > 3) For my project, the user fills multiple text boxes (for each
> query).
> > I
> > > > combine these into a single query (with different treatment for
> > contents
> > > of
> > > > each text box). Consequently, my query looks something like
> > (fieldname1:
> > > > queryterm1 queryterm2^2.0 queryterm3^3.0 +queryterm4)^1.0  Are there
> > any
> > > > guidelines for improving performance of such a system (sorry, this
> bit
> > is
> > > > vague)
> > > >
> > > > Any help with this will be great !
> > > >
> > > > Girish Redekar
> > > > http://girishredekar.net
> > > >
> > >
> >
>

Re: Index time boosts, payloads, and long query strings

Reply via email to