Re: Relevancy Scoring

John Blythe Mon, 18 May 2015 11:27:31 -0700

Hey Doug,

Thanks for the quick reply.


No edismax just yet. Planning on getting there, but have been trying to
fine tune the 3 primary fields we use over the last week or so before
jumping into edismax and its nifty toolset to help push our accuracy and
precision even further (aside: is this a good strategy?)

For now I'm querying directly in the admin interface, doing something like
this:
mfgname2: Ben & Jerry's + descript1: Strawberry Shortcake Ice Cream 1.5pt +
productnumber: 001-029-1298

versus
mfgname2: Ben & Jerry's + descript1: Strawberry Shortcake Ice Cream 1.5pt

Another interesting and likely related factor is the description's lack of
help. With the product number in place it gets nailed even with stray
zeros, 4's instead of 1's, etc.

Without it, though, the querying just flat out sucks. For instance, I just
saw something akin to this:
mfgname2: Ben & Jerry's + descript1: Straw Shortcake Ice Cream 1.5pt

that got nowhere near what it should have. Straw would have a synonym to
map to strawberry and would match the document's description *exactly, *yet
Solr would push out all sorts of peripheral suggestions that didn't match
strawberry or was a different amount (.75pt, for instance). I know I'm no
expert, but I was thinking my analyzer was a bit better than that :p

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Mon, May 18, 2015 at 2:18 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> > The maxScore is 772 when I remove the
> description.
> > I suppose the actual question, then, is if a low relevancy score on one
> field
> hurts the rest of them / the cumulative score,
>
> This depends a lot on how you're searching over these fields. Is this a
> (e)dismax query? Or a lucene query? Something else?
>
> Across fields there's query normalization, which attempts to take a sum of
> squares of IDFs of the search terms across the fields being searched.
> Adding/removing a field could impact query normalization.
>
> By removing a field, you also likely remove a boolean clause. By removing
> the clause, there's less of a chance the coordinating factor (known as
> coord) would punish your relevancy score.
>
> Otherwise, don't know -- perhaps you could give us more information on how
> you're searching your documents? Perhaps a sample Solr URL that shows how
> you're querying?
>
> Cheers,
> --
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
> LLC | 240.476.9983 | http://www.opensourceconnections.com
> Author: Relevant Search <http://manning.com/turnbull> from Manning
> Publications
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
> On Mon, May 18, 2015 at 1:57 PM, John Blythe <j...@curvolabs.com> wrote:
>
> > Background:
> > I'm using Solr as a mechanism for search for users, but before even
> getting
> > to that point as a means of intelligent inference more or less. Product
> > data comes in and we're hoping to match it to the correct known product
> > without having to use the user for confirmation/search.
> >
> > Problem:
> > I get a maxScore (with the correct result at the top) of 618.22626 using
> > the manufacturer's name, the product number, and the product description.
> > All of these items are coming from a previous purchaser so we have to
> > account for manufacturer name variations, miskeying of product numbers,
> and
> > variances of descriptions. The maxScore is 772 when I remove the
> > description.
> >
> > My initial question is regarding relevancy scoring (
> > https://wiki.apache.org/solr/SolrRelevancyFAQ). I get that many of the
> > description's tokens will be found throughout the other documents, thus
> > keeping the relevancy at bay per the IDF portion of the relevancy score.
> I
> > suppose the actual question, then, is if a low relevancy score on one
> field
> > hurts the rest of them / the cumulative score, or if it simply keep that
> > field's contribution lower than it'd otherwise be. I thought it was the
> > latter, but the results I mention above are making me think that the
> first
> > scenario is actually the case.
> >
> > Based on what I hear about the above, a follow up question may be what in
> > the world is wrong with my analyzer :)
> >
> > Thanks for any thoughts!
> >
> > Best,
> > John
> >
>

Re: Relevancy Scoring

Reply via email to