--- On Sun, 3/13/11, Andy Newby <andy.ne...@gmail.com> wrote: > From: Andy Newby <andy.ne...@gmail.com> > Subject: Results driving me nuts! > To: solr-user@lucene.apache.org > Date: Sunday, March 13, 2011, 10:38 PM > Hi, > > Ok, I'm really really trying to get my head around this, > but I just can't :/ > > Here are 2 example records, both using the query "st > patricks" to > search on (matches for the keywords are in **stars** like > so, to make > a point of what SHOULD be matching); > > keywords: animations mini alphabets **st** **patricks** > animated 1 > clover animations mini alphabets **st** **patricks** > description: animated 1 clover > > "124966":" > 209.23984 = (MATCH) product of: > 418.47968 = (MATCH) sum of: > 418.47968 = (MATCH) sum of: > 212.91336 = (MATCH) weight(keywords:st > in 5697), product of: > 0.41379675 = > queryWeight(keywords:st), product of: > 7.5798326 = > idf(docFreq=233, maxDocs=168578) > 0.05459181 = queryNorm > 514.5361 = (MATCH) > fieldWeight(keywords:st in 5697), product of: > 1.4142135 = > tf(termFreq(keywords:st)=2) > 7.5798326 = > idf(docFreq=233, maxDocs=168578) > 48.0 = > fieldNorm(field=keywords, doc=5697) > 205.56633 = (MATCH) > weight(keywords:patricks in 5697), product of: > 0.4065946 = > queryWeight(keywords:patricks), product of: > 7.447905 = > idf(docFreq=266, maxDocs=168578) > 0.05459181 = queryNorm > 505.58057 = (MATCH) > fieldWeight(keywords:patricks in 5697), product of: > 1.4142135 = > tf(termFreq(keywords:patricks)=2) > 7.447905 = > idf(docFreq=266, maxDocs=168578) > 48.0 = > fieldNorm(field=keywords, doc=5697) > 0.5 = coord(1/2) > > The other one: > > desc: a black and white mug of beer with a three leaf > clover in it > keywords: saint **patricks** day green irish > beer spel132_bw clip > art holidays **st** **patricks** day > handle drink celebrate clip art holidays **st** > **patricks** day > > 5 matches > > "145351":" > 193.61652 = (MATCH) product of: > 387.23303 = (MATCH) sum of: > 387.23303 = (MATCH) sum of: > 177.4278 = (MATCH) weight(keywords:st > in 25380), product of: > 0.41379675 = > queryWeight(keywords:st), product of: > 7.5798326 = > idf(docFreq=233, maxDocs=168578) > 0.05459181 = queryNorm > 428.78006 = (MATCH) > fieldWeight(keywords:st in 25380), product of: > 1.4142135 = > tf(termFreq(keywords:st)=2) > 7.5798326 = > idf(docFreq=233, maxDocs=168578) > 40.0 = > fieldNorm(field=keywords, doc=25380) > 209.80525 = (MATCH) > weight(keywords:patricks in 25380), product of: > 0.4065946 = > queryWeight(keywords:patricks), product of: > 7.447905 = > idf(docFreq=266, maxDocs=168578) > 0.05459181 = queryNorm > 516.006 = (MATCH) > fieldWeight(keywords:patricks in 25380), product of: > 1.7320508 = > tf(termFreq(keywords:patricks)=3) > 7.447905 = > idf(docFreq=266, maxDocs=168578) > 40.0 = > fieldNorm(field=keywords, doc=25380) > 0.5 = coord(1/2) > > > Now the thing thats getting me, is the record which has 5 > occurencs of > "st patricks" , is so different in terms of the scores it > gives! > > 209.23984 > 193.61652 > > (these should be the other way around) > > Can anyone try and explain whats going on with this? > > BTW, the queries are matched based on a normal "white > space" index, > nothing special. > > The actual query being used, is as follows: > > (keywords:"st" AND keywords:"patricks") OR > (description:"st" AND > description:"patricks") > > TIA - I'm hoping someone can save my sanity ;)
Their fieldNorm values are different. Norm consists of index time boost and length normalization. http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm I can see that the one with 5 matches is longer than the other. Shorter documents are favored in solr/lucene with length normalization factor.