Thanks Emmanuel for that explanation. I implemented your solution but I'm
not quite there yet. Suppose I also have a record:

RECORD 3
<arr name="myname">
  <str>Fred G. Anderson</str>
  <str>Fred Anderson</str>
</arr>

With your solution, RECORD 1 does appear at the top but I think thats just
blind luck more than anything else because RECORD 3 shows as having the same
score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd
like all three records returned with RECORD 1 being the first listing.

Thanks,

Brian Lamb

On Tue, Jul 26, 2011 at 6:03 PM, Emmanuel Espina
<espinaemman...@gmail.com>wrote:

> That is caused by the size of the documents. The principle is pretty
> intuitive if one of your documents is the entire three volumes of The Lord
> of the Rings, and you search for "tree" I know that The Lord of the Rings
> will be in the results, and I haven't memorized the entire text of that
> book
> :p
> It is a matter of probability that if you have a big (big!) text any word
> will have a greater chance to be found than in a smaller letter. So one can
> infer that the letter is more relevant than the big text. That is the
> principle applied here and Lucene does that when building the ranking.
> The first document is bigger (remember that all the values of a multivalued
> field are merged into one field in the index, so you can not tell one value
> from another apart) than the second one. In the first one you have
> [Fred, coolest,
> guy, town] and in the second [Fred, Anderson], so the second document is
> more relevant than the first one.
>
> To avoid all this procedure you can set omitNorms to true and that should
> make the first document more relevant because Fred appears twice (not
> because Fred appears alone in a value)
>
> Regards
> Emmanuel
>
> 2011/7/26 Brian Lamb <brian.l...@journalexperts.com>
>
> > Hi all,
> >
> > I am a little confused as to why the scoring is working the way it is:
> >
> > I have a field defined as:
> >
> > <field name="myname" type="text" indexed="true" stored="true"
> > required="false" multivalued="true" />
> >
> > And I have several documents where that value is:
> >
> > RECORD 1
> > <arr name="myname">
> >  <str>Fred</str>
> >  <str>Fred (the coolest guy in town)</str>
> > </arr>
> >
> > OR
> >
> > RECORD 2
> > <arr name="myname">
> >  <str>Fred Anderson</str>
> > </arr>
> >
> > What happens when I do a search for
> > http://localhost:8983/solr/search/?q=myname:Fred I get RECORD 2
> > returned before RECORD 1.
> >
> > RECORD 2
> > 5.282213 = (MATCH) fieldWeight(myname:Fred in 256575), product of:
> >  1.0 = tf(termFreq(myname:Fred)=1)
> >  8.451541 = idf(docFreq=7306, maxDocs=12586425)
> >  0.625 = fieldNorm(field=myname, doc=256575)
> >
> > RECORD 1
> > 4.482106 = (MATCH) fieldWeight(myname:Fred in 215), product of:
> >  1.4142135 = tf(termFreq(myname:Fred)=2)
> >  8.451541 = idf(docFreq=7306, maxDocs=12586425)
> >  0.375 = fieldNorm(field=myname, doc=215)
> >
> > So the difference is fieldNorm obviously but I think that's only part
> > of the story. Why is RECORD 2 returned with a higher score than RECORD
> > 1 even though RECORD 1 matches "Fred" exactly? And how should I do
> > this differently so that I am getting the results I am expecting?
> >
> > Thanks,
> >
> > Brian Lamb
> >
>

Reply via email to