RE: A question about scoring function in Lucene

2004-12-15 Thread Chuck Williams
rom: Doug Cutting [mailto:[EMAIL PROTECTED] > Sent: Wednesday, December 15, 2004 12:35 PM > To: Lucene Users List > Subject: Re: A question about scoring function in Lucene > > Chris Hostetter wrote: > > For example, using the current scoring equation, if i do a

Re: A question about scoring function in Lucene

2004-12-15 Thread Doug Cutting
Chris Hostetter wrote: For example, using the current scoring equation, if i do a search for "Doug Cutting" and the results/scores i get back are... 1: 0.9 2: 0.3 3: 0.21 4: 0.21 5: 0.1 ...then there are at least two meaningful pieces of data I can glean:

Re: A question about scoring function in Lucene

2004-12-15 Thread Doug Cutting
Otis Gospodnetic wrote: There is one case that I can think of where this 'constant' scoring would be useful, and I think Chuck already mentioned this 1-2 months ago. For instace, having such scores would allow one to create alert applications where queries run by some scheduler would trigger an al

Re: A question about scoring function in Lucene

2004-12-15 Thread Otis Gospodnetic
There is one case that I can think of where this 'constant' scoring would be useful, and I think Chuck already mentioned this 1-2 months ago. For instace, having such scores would allow one to create alert applications where queries run by some scheduler would trigger an alert whenever the score i

Re: A question about scoring function in Lucene

2004-12-15 Thread Chris Hostetter
: I question whether such scores are more meaningful. Yes, such scores : would be guaranteed to be between zero and one, but would 0.8 really be : meaningful? I don't think so. Do you have pointers to research which : demonstrates this? E.g., when such a scoring method is used, that : threshold

Re: A question about scoring function in Lucene

2004-12-15 Thread Doug Cutting
Chuck Williams wrote: I believe the biggest problem with Lucene's approach relative to the pure vector space model is that Lucene does not properly normalize. The pure vector space model implements a cosine in the strictly positive sector of the coordinate space. This is guaranteed intrinsically

RE: A question about scoring function in Lucene

2004-12-15 Thread Chuck Williams
t; From: Nhan Nguyen Dang [mailto:[EMAIL PROTECTED] > Sent: Wednesday, December 15, 2004 1:18 AM > To: Lucene Users List > Subject: RE: A question about scoring function in Lucene > > Thank for your answer, > In Lucene scoring function, they use only norm_q, > but f

RE: A question about scoring function in Lucene

2004-12-15 Thread Nhan Nguyen Dang
w.emeraldinsight.com/rpsv/cgi-bin/emft.pl > if you sign up for an eval. > > It's easy to correct for idf^2 by using a customer > Similarity that takes a final square root. > > Chuck > > > -Original Message----- > > From: Vikas Gupta [mailto:[EMAIL PROTECTED]

RE: A question about scoring function in Lucene

2004-12-14 Thread Chuck Williams
rom: Vikas Gupta [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 14, 2004 9:32 PM > To: Lucene Users List > Subject: Re: A question about scoring function in Lucene > > Lucene uses the vector space model. To understand that: > > -Read section 2.1 of "Space op

Re: A question about scoring function in Lucene

2004-12-14 Thread Vikas Gupta
Lucene uses the vector space model. To understand that: -Read section 2.1 of "Space optimizations for Total Ranking" paper (Linked here http://lucene.sourceforge.net/publications.html) -Read section 6 to 6.4 of http://www.csee.umbc.edu/cadip/readings/IR.report.120600.book.pdf -Read section 1 of ht

A question about scoring function in Lucene

2004-12-14 Thread Nhan Nguyen Dang
Hi all, Lucene score document based on the correlation between the query q and document t: (this is raw function, I don't pay attention to the boost_t, coord_q_d factor) score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t) (*) Could anybody explain it in detail ? Or are there any p