Re: Boost Strangeness

Judioo Thu, 16 Jun 2011 06:57:09 -0700

fascinating!!!!

Thank you so much Erik, I'm slowly beginning to understand.


SO I've discovered that by defining 'splitOnNumerics="0"' on the filter
class 'solr.WordDelimiterFilterFactory' ( for ONLY the query analyzer ) I
can get *closer* to my required goal!

Now something else odd is occuring.

It only returns 2 results where there is over 70?

Why is that? I can't find were this is explained :(

query

/solr/select?omitNorms=true&q=b006m86d&defType=dismax&qf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1&debugQuery=on&fl=type,id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score&wt=json&indent=on&omitNorms=true

output

{

   - -
   responseHeader: {
      - status: 0
      - QTime: 51
      - -
      params: {
         - debugQuery: "on"
         - fl:
         
"type,id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score"
         - indent: "on"
         - q: "b006m86d"
         - qf: "id^10 parent_id^9 brand_container_id^8 series_container_id^8
         subseries_container_id^8 clip_container_id^1 clip_episode_id^1"
         - wt: "json"
         - -
         omitNorms: [
            - "true"
            - "true"
         ]
         - defType: "dismax"
      }
   }
   - -
   response: {
      - numFound: 2
      - start: 0
      - maxScore: 13.473297
      - -
      docs: [
         - -
         {
            - parent_id: ""
            - id: "b006m86d"
            - type: "brand"
            - score: 13.473297
         }
         - -
         {
            - series_container_id: ""
            - id: "b00y1w9h"
            - type: "episode"
            - brand_container_id: "b006m86d"
            - subseries_container_id: ""
            - clip_episode_id: ""
            - score: 11.437143
         }
      ]
   }
   - -
   debug: {
      - rawquerystring: "b006m86d"
      - querystring: "b006m86d"
      - parsedquery: "+DisjunctionMaxQuery((id:b006m86d^10.0 |
      clip_episode_id:b006m86d | subseries_container_id:b006m86d^8.0 |
      series_container_id:b006m86d^8.0 | clip_container_id:b006m86d |
      brand_container_id:b006m86d^8.0 | parent_id:b006m86d^9.0)) ()"
      - parsedquery_toString: "+(id:b006m86d^10.0 | clip_episode_id:b006m86d
      | subseries_container_id:b006m86d^8.0 |
series_container_id:b006m86d^8.0 |
      clip_container_id:b006m86d | brand_container_id:b006m86d^8.0 |
      parent_id:b006m86d^9.0) ()"
      - -
      explain: {
         - b006m86d: " 13.473297 = (MATCH) sum of: 13.473297 = (MATCH) max
         of: 13.473297 = (MATCH) fieldWeight(id:b006m86d in 27636),
product of: 1.0 =
         tf(termFreq(id:b006m86d)=1) 13.473297 = idf(docFreq=2,
maxDocs=783800) 1.0 =
         fieldNorm(field=id, doc=27636) "
         - b00y1w9h: " 11.437143 = (MATCH) sum of: 11.437143 = (MATCH) max
         of: 11.437143 = (MATCH) weight(brand_container_id:b006m86d^8.0 in 61),
         product of: 0.82407516 = queryWeight(brand_container_id:b006m86d^8.0),
         product of: 8.0 = boost 13.878762 = idf(docFreq=1, maxDocs=783800)
         0.007422088 = queryNorm 13.878762 = (MATCH)
         fieldWeight(brand_container_id:b006m86d in 61), product of: 1.0 =
         tf(termFreq(brand_container_id:b006m86d)=1) 13.878762 = idf(docFreq=1,
         maxDocs=783800) 1.0 = fieldNorm(field=brand_container_id, doc=61) "
      }
      - QParser: "DisMaxQParser"
      - altquerystring: null
      - boostfuncs: null
      - -
      timing: {
         - time: 51
         - -
         prepare: {
            - time: 6
            - -
            org.apache.solr.handler.component.QueryComponent: {
               - time: 5
            }
            - -
            org.apache.solr.handler.component.FacetComponent: {
               - time: 0
            }
            - -
            org.apache.solr.handler.component.MoreLikeThisComponent: {
               - time: 0
            }
            - -
            org.apache.solr.handler.component.HighlightComponent: {
               - time: 1
            }
            - -
            org.apache.solr.handler.component.StatsComponent: {
               - time: 0
            }
            - -
            org.apache.solr.handler.component.DebugComponent: {
               - time: 0
            }
         }
         - -
         process: {
            - time: 45
            - -
            org.apache.solr.handler.component.QueryComponent: {
               - time: 27
            }
            - -
            org.apache.solr.handler.component.FacetComponent: {
               - time: 0
            }
            - -
            org.apache.solr.handler.component.MoreLikeThisComponent: {
               - time: 0
            }
            - -
            org.apache.solr.handler.component.HighlightComponent: {
               - time: 0
            }
            - -
            org.apache.solr.handler.component.StatsComponent: {
               - time: 0
            }
            - -
            org.apache.solr.handler.component.DebugComponent: {
               - time: 18
            }
         }
      }
   }

}


On 15 June 2011 13:16, Erick Erickson <erickerick...@gmail.com> wrote:

> First off, you didn't "violate groups ettiquette". In fact, yours was
> one of the better first posts in terms or providing enough information
> for us to actually help!
>
> A very useful page is the admin/analysis page to see how the
> analysis chain works. For instance, if you haven't changed the
> field type (i.e. <fieldType name="text">) that your input is
> being broken up by WordDelimiterFilterFactory. Be sure to check
> the "verbose" checkbox and enter text in both the query and
> index boxes!
>
> Here's an invaluable page, though do note that it's not exhaustive:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
>
> But on to your problem:
>
> First, boosting isn't absolute, boosting terms just tends to
> bubble things up, you have to experiment with various weights....
>
> To get the full comparison for both documents you're curious about,
> try using "explainOther". see:
>
> http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_doesn.27t_document_id:juggernaut_appear_in_the_top_10_results_for_my_query
>
> If you use that against the two docs in question, you should
> see (although it's a hard read!) the reason the docs got
> their relative scores.
>
> Finally, your next e-mail hints at what's happening. If you're
> putting multiple tokens in some of these fields, the length
> normalization may be causing the matches to score lower. You can
> try disabling those calculations (omitNorms="true" in your field
> definition).
> See:
>
> http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr
>
> String types accept spaces just fine, but you might want to define
> the fields with 'multiValued="true" ' and index each as a separate
> field (note that won't work with a field that's also your <uniqueKey>).
>
> Best
> Erick
>
> On Wed, Jun 15, 2011 at 7:16 AM, Judioo <cont...@judioo.com> wrote:
> >   <dynamicField name="*_id"  type="text"    indexed="true"
>  stored="true"/>
> >
> > so all attributes except 'id' are of type text.
> >
> > I didn't know that about the string type. So is my problem as described (
> > that partial matches are contributing to the calculation ) and does
> defining
> > the filed type as string solve this problem.
> >
> > Or is my understanding completely incorrect?
> >
> > Thanks in advance
> >
> > On 15 June 2011 12:08, Ahmet Arslan <iori...@yahoo.com> wrote:
> >
> >> >
> >>
> /solr/select/?q=b007vty6&defType=dismax&qf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1&debugQuery=on&fl=id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score&wt=json&indent=on
> >> >
> >> >
> >> > same result ( just higher scores ). It's almost as if
> >> > partial matches on
> >> > brand|series_container_id and id are being considered in
> >> > the 1st document.
> >> > Surely this can't be right / expected?
> >>
> >> What is your fieldType definition? Don't you think it is better to use
> >> string type which is not tokenized?
> >>
> >
>

Re: Boost Strangeness

Reply via email to