Re: Relevancy Scoring

Doug Turnbull Mon, 18 May 2015 11:21:47 -0700

Also, I wouldn't expect at all that query-to-query you'll get comparable
scores. I'm not at all surprised that suddenly you get big swings in
scoring. So many parts of the scoring equation can change query to query.


On Mon, May 18, 2015 at 2:18 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> > The maxScore is 772 when I remove the
> description.
> > I suppose the actual question, then, is if a low relevancy score on one
> field
> hurts the rest of them / the cumulative score,
>
> This depends a lot on how you're searching over these fields. Is this a
> (e)dismax query? Or a lucene query? Something else?
>
> Across fields there's query normalization, which attempts to take a sum of
> squares of IDFs of the search terms across the fields being searched.
> Adding/removing a field could impact query normalization.
>
> By removing a field, you also likely remove a boolean clause. By removing
> the clause, there's less of a chance the coordinating factor (known as
> coord) would punish your relevancy score.
>
> Otherwise, don't know -- perhaps you could give us more information on how
> you're searching your documents? Perhaps a sample Solr URL that shows how
> you're querying?
>
> Cheers,
> --
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
> LLC | 240.476.9983 | http://www.opensourceconnections.com
> Author: Relevant Search <http://manning.com/turnbull> from Manning
> Publications
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
> On Mon, May 18, 2015 at 1:57 PM, John Blythe <j...@curvolabs.com> wrote:
>
>> Background:
>> I'm using Solr as a mechanism for search for users, but before even
>> getting
>> to that point as a means of intelligent inference more or less. Product
>> data comes in and we're hoping to match it to the correct known product
>> without having to use the user for confirmation/search.
>>
>> Problem:
>> I get a maxScore (with the correct result at the top) of 618.22626 using
>> the manufacturer's name, the product number, and the product description.
>> All of these items are coming from a previous purchaser so we have to
>> account for manufacturer name variations, miskeying of product numbers,
>> and
>> variances of descriptions. The maxScore is 772 when I remove the
>> description.
>>
>> My initial question is regarding relevancy scoring (
>> https://wiki.apache.org/solr/SolrRelevancyFAQ). I get that many of the
>> description's tokens will be found throughout the other documents, thus
>> keeping the relevancy at bay per the IDF portion of the relevancy score. I
>> suppose the actual question, then, is if a low relevancy score on one
>> field
>> hurts the rest of them / the cumulative score, or if it simply keep that
>> field's contribution lower than it'd otherwise be. I thought it was the
>> latter, but the results I mention above are making me think that the first
>> scenario is actually the case.
>>
>> Based on what I hear about the above, a follow up question may be what in
>> the world is wrong with my analyzer :)
>>
>> Thanks for any thoughts!
>>
>> Best,
>> John
>>
>
>
>
>
>


-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search <http://manning.com/turnbull> from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Re: Relevancy Scoring

Reply via email to