Yep <G>.... On Mon, Nov 23, 2009 at 4:13 AM, Girish Redekar <girish.rede...@aplopio.com>wrote:
> Thanks Erick! > > After reading your answer, and re-reading the Solr wiki, I realized my > folly. I used to think that index-time boosts when applied on a per-field > basis are equivalent to query time boosts to that field. > > To ensure that my new understanding is correct , I'll state it in my words. > Index time boosts will determine boost for a *document* if it is counted as > a hit. Query time boosts give you control on boosting the occurrence of a > query in a specific field. > > Please correct me if I'm wrong (again) :-) > > Girish Redekar > http://girishredekar.net > > > On Sun, Nov 22, 2009 at 8:25 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > I still think they are apples and oranges. If you boost *all* titles, > > you're effectively boosting none of them. Index time boosting > > expresses "this document's title is more important than other > > document titles." What I think you're after is "titles are more > > important than other parts of the document. > > > > For this latter, you're talking query-time boosting. Boosting only > > really makes sense if there are multiple clauses, something > > like title:important OR body:unimportant. If this is true, speed > > is irrelevant, you need correct behavior. > > > > Not that I think you'd notice either way. Modern computers > > can do a LOT of FLOPS/sec. Here's an experiment: time > > some queries (but beware of timing the very first ones, see > > the Wiki) with boosts and without boosts. I doubt you'll see > > enough difference to matter (but please do report back if you > > do, it'll further my education <G>). > > > > But, depending on your index structure, you may get this > > anyway. Generally, matches on shorter fields weigh more > > in the score calculations than on longer fields. If you have > > fields like title and body and you are querying on title:term OR > > body:term, documents with term in the title will tend toward > > higher scores. > > > > But before putting too much effort into this, do you have any > > evidence that the default behavior is unsatisfactory? Because > > unless and until you do, I think this is a distraction <G>... > > > > Best > > Erick > > > > On Sun, Nov 22, 2009 at 8:37 AM, Girish Redekar > > <girish.rede...@aplopio.com>wrote: > > > > > Hi Erick - > > > > > > Maybe I mis-wrote. > > > > > > My question is: would "title:any_query^4.0" be faster/slower than > > applying > > > index time boost to the field title. Basically, if I take *every* user > > > query > > > and search for it in title with boost (say, 4.0) - is it different than > > > saying field title has boost 4.0? > > > > > > Cheers, > > > Girish Redekar > > > http://girishredekar.net > > > > > > > > > On Sun, Nov 22, 2009 at 2:02 AM, Erick Erickson < > erickerick...@gmail.com > > > >wrote: > > > > > > > I'll take a whack at index .vs. query boosting. They are expressing > > very > > > > different concepts. Let's claim we're interested in boosting the > title > > > > field.... > > > > > > > > Index time boosting is expressing "this document's title is X more > > > > important > > > > > > > > than a normal document title". It doesn't matter *what* the title is, > > > > any query that matches on anything in this document's title will give > > > this > > > > document a boost. I might use this to give preferential treatment to > > all > > > > encyclopedia entries or something. > > > > > > > > Query time boosting, like "title:solr^4.0" expresses "Any document > with > > > > solr > > > > in > > > > it's title is more important than documents without solr in the > title". > > > > This > > > > really > > > > only makes sense if you have other clauses that might cause a > document > > > > *without* > > > > solr the title to match...... > > > > > > > > Since they are doing different things, efficiency isn't really > > relevant. > > > > > > > > HTH > > > > Erick > > > > > > > > > > > > On Sat, Nov 21, 2009 at 2:13 AM, Girish Redekar > > > > <girish.rede...@aplopio.com>wrote: > > > > > > > > > Hi , > > > > > > > > > > I'm relatively new to Solr/Lucene, and am using Solr (and not > lucene > > > > > directly) primarily because I can use it without writing java code > > > (rest > > > > of > > > > > my project is python coded). > > > > > > > > > > My application has the following requirements: > > > > > (a) ability to search over multiple fields, each with different > > weight > > > > > (b) If possible, I'd like to have the ability to add > extra/diminished > > > > > weights to particular tokens within a field > > > > > (c) My query strings have large lengths (50-100 words) > > > > > (d) My index is 500K+ documents > > > > > > > > > > 1) The way to (a) is field boosting (right?). My question is: Is > all > > > > field > > > > > boosting done at query time? Even if I give index time boosts to > > > fields? > > > > Is > > > > > there a performance advantage in boosting fields at index time vs > at > > > > using > > > > > something like fieldname:querystring^boost. > > > > > 2) From what I've read, it seems that I can do (b) using payloads. > > > > However, > > > > > as this link ( > > > > > > > > > > > > > > > > > > > > http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ > > > > > ) > > > > > suggests, I will have to write a payload aware Query Parser. Wanted > > to > > > > > confirm if this is indeed the case - or is there a out-of-box way > to > > > > > implement payloads (am using Solr1.4) > > > > > 3) For my project, the user fills multiple text boxes (for each > > query). > > > I > > > > > combine these into a single query (with different treatment for > > > contents > > > > of > > > > > each text box). Consequently, my query looks something like > > > (fieldname1: > > > > > queryterm1 queryterm2^2.0 queryterm3^3.0 +queryterm4)^1.0 Are > there > > > any > > > > > guidelines for improving performance of such a system (sorry, this > > bit > > > is > > > > > vague) > > > > > > > > > > Any help with this will be great ! > > > > > > > > > > Girish Redekar > > > > > http://girishredekar.net > > > > > > > > > > > > > > >