Re: Scoring by document size

Erick Erickson Tue, 17 Sep 2013 06:12:15 -0700

This kind of artificial test is almost always misleading.
Some approximations are used, in particular the
length of the field is not stored as an exact number,
so at various points some fields with slightly different
lengths are "rounded" to the same number, thus the
identical scores you're seeing.


Unless you have a compelling reason, I wouldn't
spend too much time trying to adjust scores in this
kind of situation, if your real data exhibits behavior
you need to change it's a different story of course.

Best,
Erick


On Tue, Sep 17, 2013 at 3:40 AM, blopez <balo...@hotmail.com> wrote:

> Hi all,
>
> I have some doubts about the Solr scoring function. I'm using all default
> configuration, but I'm facing a wired issue with the retrieved scores.
>
> In the schema, I'm going to focus in the only field I'm interested in. Its
> definition is:
>
> *<fieldType name="text" class="solr.TextField" sortMissingLast="true"
> omitNorms="false">
>                         <analyzer type="index">
>                                 <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                 <filter
> class="solr.LowerCaseFilterFactory"/>
>                                 <filter
> class="solr.ASCIIFoldingFilterFactory"/>
>                         </analyzer>
>                         <analyzer type="query">
>                                 <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>                                 <filter
> class="solr.LowerCaseFilterFactory"/>
>                                 <filter
> class="solr.ASCIIFoldingFilterFactory"/>
>                         </analyzer>
> </fieldType>
>
> <field name="myField" type="text" indexed="true" stored="true"
> required="false" />*
>
> (omitNorms="false", if not, the document size is not taken into account to
> the final score)
>
> Then, I index some documents, with the following text in the 'myField'
> field:
>
> doc1 = "A B C"
> doc2 = "A B C D"
> doc3 = "A B C D E"
> doc4 = "A B C D E F"
> doc5 = "A B C D E F G H"
> doc6 = "A B C D E F G H I"
>
> Finally, I perform the query 'myField:("A" "B" "C")' in order to recover
> all
> the documents, but with different scoring (doc1 is more similar to the
> query
> than doc2, which is more similar than doc3, ...).
>
> All the documents are retrieved (OK), but the scores are like this:
>
> *doc1 = 2,590214
> doc2 = 2,590214*
> doc3 = 2,266437
> *doc4 = 1,94266
> doc5 = 1,94266*
> doc6 = 1,618884
>
> So in conclussion, as you can see the score goes down, but not the way I'd
> like. Doc1 is getting the same scoring than Doc2, even when Doc1 matches
> 3/3
> tokens, and Doc2 matches 3/4 tokens.
>
> Is this the normal Solr behaviour? Is there any way to get my expected
> behaviour?
>
> Thanks a lot,
> Borja.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Scoring-by-document-size-tp4090523.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Scoring by document size

Reply via email to