Thanks Emmanuel for that explanation. I implemented your solution but I'm not quite there yet. Suppose I also have a record:
RECORD 3 <arr name="myname"> <str>Fred G. Anderson</str> <str>Fred Anderson</str> </arr> With your solution, RECORD 1 does appear at the top but I think thats just blind luck more than anything else because RECORD 3 shows as having the same score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd like all three records returned with RECORD 1 being the first listing. Thanks, Brian Lamb On Tue, Jul 26, 2011 at 6:03 PM, Emmanuel Espina <espinaemman...@gmail.com>wrote: > That is caused by the size of the documents. The principle is pretty > intuitive if one of your documents is the entire three volumes of The Lord > of the Rings, and you search for "tree" I know that The Lord of the Rings > will be in the results, and I haven't memorized the entire text of that > book > :p > It is a matter of probability that if you have a big (big!) text any word > will have a greater chance to be found than in a smaller letter. So one can > infer that the letter is more relevant than the big text. That is the > principle applied here and Lucene does that when building the ranking. > The first document is bigger (remember that all the values of a multivalued > field are merged into one field in the index, so you can not tell one value > from another apart) than the second one. In the first one you have > [Fred, coolest, > guy, town] and in the second [Fred, Anderson], so the second document is > more relevant than the first one. > > To avoid all this procedure you can set omitNorms to true and that should > make the first document more relevant because Fred appears twice (not > because Fred appears alone in a value) > > Regards > Emmanuel > > 2011/7/26 Brian Lamb <brian.l...@journalexperts.com> > > > Hi all, > > > > I am a little confused as to why the scoring is working the way it is: > > > > I have a field defined as: > > > > <field name="myname" type="text" indexed="true" stored="true" > > required="false" multivalued="true" /> > > > > And I have several documents where that value is: > > > > RECORD 1 > > <arr name="myname"> > > <str>Fred</str> > > <str>Fred (the coolest guy in town)</str> > > </arr> > > > > OR > > > > RECORD 2 > > <arr name="myname"> > > <str>Fred Anderson</str> > > </arr> > > > > What happens when I do a search for > > http://localhost:8983/solr/search/?q=myname:Fred I get RECORD 2 > > returned before RECORD 1. > > > > RECORD 2 > > 5.282213 = (MATCH) fieldWeight(myname:Fred in 256575), product of: > > 1.0 = tf(termFreq(myname:Fred)=1) > > 8.451541 = idf(docFreq=7306, maxDocs=12586425) > > 0.625 = fieldNorm(field=myname, doc=256575) > > > > RECORD 1 > > 4.482106 = (MATCH) fieldWeight(myname:Fred in 215), product of: > > 1.4142135 = tf(termFreq(myname:Fred)=2) > > 8.451541 = idf(docFreq=7306, maxDocs=12586425) > > 0.375 = fieldNorm(field=myname, doc=215) > > > > So the difference is fieldNorm obviously but I think that's only part > > of the story. Why is RECORD 2 returned with a higher score than RECORD > > 1 even though RECORD 1 matches "Fred" exactly? And how should I do > > this differently so that I am getting the results I am expecting? > > > > Thanks, > > > > Brian Lamb > > >