--- On Sun, 3/13/11, Andy Newby <andy.ne...@gmail.com> wrote:

> From: Andy Newby <andy.ne...@gmail.com>
> Subject: Results driving me nuts!
> To: solr-user@lucene.apache.org
> Date: Sunday, March 13, 2011, 10:38 PM
> Hi,
> 
> Ok, I'm really really trying to get my head around this,
> but I just can't :/
> 
> Here are 2 example records, both using the query "st
> patricks" to
> search on (matches for the keywords are in **stars** like
> so, to make
> a point of what SHOULD be matching);
> 
> keywords: animations mini alphabets **st** **patricks**
> animated 1
> clover  animations mini alphabets **st** **patricks**
> description: animated 1 clover
> 
> "124966":"
> 209.23984 = (MATCH) product of:
>   418.47968 = (MATCH) sum of:
>     418.47968 = (MATCH) sum of:
>       212.91336 = (MATCH) weight(keywords:st
> in 5697), product of:
>         0.41379675 =
> queryWeight(keywords:st), product of:
>           7.5798326 =
> idf(docFreq=233, maxDocs=168578)
>           0.05459181 = queryNorm
>         514.5361 = (MATCH)
> fieldWeight(keywords:st in 5697), product of:
>           1.4142135 =
> tf(termFreq(keywords:st)=2)
>           7.5798326 =
> idf(docFreq=233, maxDocs=168578)
>           48.0 =
> fieldNorm(field=keywords, doc=5697)
>       205.56633 = (MATCH)
> weight(keywords:patricks in 5697), product of:
>         0.4065946 =
> queryWeight(keywords:patricks), product of:
>           7.447905 =
> idf(docFreq=266, maxDocs=168578)
>           0.05459181 = queryNorm
>         505.58057 = (MATCH)
> fieldWeight(keywords:patricks in 5697), product of:
>           1.4142135 =
> tf(termFreq(keywords:patricks)=2)
>           7.447905 =
> idf(docFreq=266, maxDocs=168578)
>           48.0 =
> fieldNorm(field=keywords, doc=5697)
>   0.5 = coord(1/2)
> 
> The other one:
> 
> desc: a black and white mug of beer with a three leaf
> clover in it
> keywords: saint **patricks** day green irish
> beer   spel132_bw clip
> art holidays **st** **patricks** day
> handle drink celebrate clip art holidays **st**
> **patricks** day
> 
> 5 matches
> 
> "145351":"
> 193.61652 = (MATCH) product of:
>   387.23303 = (MATCH) sum of:
>     387.23303 = (MATCH) sum of:
>       177.4278 = (MATCH) weight(keywords:st
> in 25380), product of:
>         0.41379675 =
> queryWeight(keywords:st), product of:
>           7.5798326 =
> idf(docFreq=233, maxDocs=168578)
>           0.05459181 = queryNorm
>         428.78006 = (MATCH)
> fieldWeight(keywords:st in 25380), product of:
>           1.4142135 =
> tf(termFreq(keywords:st)=2)
>           7.5798326 =
> idf(docFreq=233, maxDocs=168578)
>           40.0 =
> fieldNorm(field=keywords, doc=25380)
>       209.80525 = (MATCH)
> weight(keywords:patricks in 25380), product of:
>         0.4065946 =
> queryWeight(keywords:patricks), product of:
>           7.447905 =
> idf(docFreq=266, maxDocs=168578)
>           0.05459181 = queryNorm
>         516.006 = (MATCH)
> fieldWeight(keywords:patricks in 25380), product of:
>           1.7320508 =
> tf(termFreq(keywords:patricks)=3)
>           7.447905 =
> idf(docFreq=266, maxDocs=168578)
>           40.0 =
> fieldNorm(field=keywords, doc=25380)
>   0.5 = coord(1/2)
> 
> 
> Now the thing thats getting me, is the record which has 5
> occurencs of
> "st patricks" , is so different in terms of the scores it
> gives!
> 
> 209.23984
> 193.61652
> 
> (these should be the other way around)
> 
> Can anyone try and explain whats going on with this?
> 
> BTW, the queries are matched based on a normal "white
> space" index,
> nothing special.
> 
> The actual query being used, is as follows:
> 
> (keywords:"st" AND keywords:"patricks") OR
> (description:"st" AND
> description:"patricks")
> 
> TIA - I'm hoping someone can save my sanity ;)

Their fieldNorm values are different. Norm consists of index time boost and 
length normalization. 

http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/Similarity.html#formula_norm

I can see that the one with 5 matches is longer than the other. Shorter 
documents are favored in solr/lucene with length normalization factor.



Reply via email to