On Thu, Nov 17, 2011 at 6:59 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote:
> > : 1. "omitTermFreqAndPositions" is very straightforward but if I avoid > : positions I'll refuse to serve phrase queries. I had searched for this in > > but do you really need phrase queries on your "cat" field? i thought the > point was to have simple matching on those terms? > > Yes I need to match phrases. Consider following documents Doc1 - categories: "teak wooden chair", "bamboo wooden chair" Doc2 - categories: "wooden chair" Doc3 - categories: "plastic chair", "wooden cupboard". A query "wooden chair" should give doc1 and doc2 with equal score (provided other fields generate same score) and doc3 should be excluded. Non-phrase match would include doc3 as well. : 2. Function query seemed nice (though strange because I never used it > : before) and I gave it a few hours but that too did not seem to solve my > : requirement. The "artificial" score we are generating is getting > multiplied > : into rest of the score which includes score due to "cat" field as well. > (I > : can not remove "cat" from "qf" as I have to search there). It is only > that > : I don't want this field's score on the basis of matching "tf". > > I don't think i realized you were using dismax ... if you just want a > match on "cat" to help determine if the document is a match, but not have > *any* impact on score, you could just set the qf boost to 0 (ie: > qf=title^10 cat^0) but i'm not sure if that's really what you want. > > Well this is almost what I want. (Thanks for telling me about ^0. I learned a new thing.). I wanted a constant score for a match in "cat" and I did not want the frequency of match in "cat" to affect the score which can be done this way. But I definitely want to generate some score, equal to single match (tf = 1) so that less important fields like "description" may not get higher boost than "cat". Writing ^0 creates 0.00 score for a match in "cat" while a match in "description" will generate some positive score greater than zero (0). > : After spending some hours on function queries I finally reached on > : following query > > Honestly: i'm not really following what you tried there because of the > formatting applied by your email client ... it seemed to be making tons of > hyperlinks out of peices of the URL. > > Looking at your query explanation however the problem seems to be that you > are still using the relevancy score of the matches on the "cat" field, > instead of *just* using hte function boost... > > I did try *just* using the function boost, i.e. removed the "cat" from "qf", but it did not seem to return documents which have matching categories just in "cat" field. The query was something like following (i hope it be clear this time) <url>?q={!boost b=$cat_boost v=$main_query} *&main_query={!dismax qf="title" v=$qry}* &cat_boost={!func}map(query({!field f=cat v=$qry},-1),0,1000,5,1) &qry=chair ... (note: i slightly modified the cat_boost parameter to use only single map() function with 5 argument form) It gave me just two docs where "title" contained the query word (chair) I also tried changing main_query like *&main_query={!dismax qf="title cat" v=$qry}* which gave me all 4 required docs but with scores varying on the basis of "cat" as well and *&main_query={!dismax qf="title cat^0" v=$qry}* which gave me all required docs with a constant (0.0) "cat" score. but when I'll add "description" in qf, docs even with worst matching in "description" will score higher than docs with a good match in "cat" which is not exactly what is required. > : But debugging the query showed that the boost value ($cat_boost) is being > : multiplied into a value which is generated with the help of "cat" field > : thus resulting in different scores for 1 and 3 (similarly for 2 and 4). > : > : 1.2942866 = (MATCH) boost(+(title:chair | cat:chair)~0.01 > : (),map(query(cat:chair,def=-1.0),0.0,1000.0,1.0)), product of: > > ...my point before was to take "cat:chair" out of the "main" part of your > query, and *only* put it in the boost function. if you are using dismax, > the "qf=cat^0" suggestion mentioned above *combined* with your boost > function will probably get you what you want (i think) > > taking "cat:chair" out of main_query (dismax equivalent - removing "cat" from "qf") or using "cat^0" did not produce desired effect as I described earlier > : I was thinking there should be some hook or plugin (or anything) which > : could just change the score calculation formula *for a particular field*. > : There is a function in DefaultSimilarity class - *public float tf(float > : freq)* but that does not mention the field name. Is there a possibility > to > : look into this direction? > > on trunk, there is a distinct Similarity object per fieldtype, so you > could certain look at that -- but you are correct that in 3x there is no > way to override the tf() function on a per field basis. > > I'll definitely look at the Similarity class. I hope there are no performance degradation issues with it :) > > -Hoss > Thank you very much. -- Regards, Samar