fascinating!!!! Thank you so much Erik, I'm slowly beginning to understand.
SO I've discovered that by defining 'splitOnNumerics="0"' on the filter class 'solr.WordDelimiterFilterFactory' ( for ONLY the query analyzer ) I can get *closer* to my required goal! Now something else odd is occuring. It only returns 2 results where there is over 70? Why is that? I can't find were this is explained :( query /solr/select?omitNorms=true&q=b006m86d&defType=dismax&qf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1&debugQuery=on&fl=type,id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score&wt=json&indent=on&omitNorms=true output { - - responseHeader: { - status: 0 - QTime: 51 - - params: { - debugQuery: "on" - fl: "type,id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score" - indent: "on" - q: "b006m86d" - qf: "id^10 parent_id^9 brand_container_id^8 series_container_id^8 subseries_container_id^8 clip_container_id^1 clip_episode_id^1" - wt: "json" - - omitNorms: [ - "true" - "true" ] - defType: "dismax" } } - - response: { - numFound: 2 - start: 0 - maxScore: 13.473297 - - docs: [ - - { - parent_id: "" - id: "b006m86d" - type: "brand" - score: 13.473297 } - - { - series_container_id: "" - id: "b00y1w9h" - type: "episode" - brand_container_id: "b006m86d" - subseries_container_id: "" - clip_episode_id: "" - score: 11.437143 } ] } - - debug: { - rawquerystring: "b006m86d" - querystring: "b006m86d" - parsedquery: "+DisjunctionMaxQuery((id:b006m86d^10.0 | clip_episode_id:b006m86d | subseries_container_id:b006m86d^8.0 | series_container_id:b006m86d^8.0 | clip_container_id:b006m86d | brand_container_id:b006m86d^8.0 | parent_id:b006m86d^9.0)) ()" - parsedquery_toString: "+(id:b006m86d^10.0 | clip_episode_id:b006m86d | subseries_container_id:b006m86d^8.0 | series_container_id:b006m86d^8.0 | clip_container_id:b006m86d | brand_container_id:b006m86d^8.0 | parent_id:b006m86d^9.0) ()" - - explain: { - b006m86d: " 13.473297 = (MATCH) sum of: 13.473297 = (MATCH) max of: 13.473297 = (MATCH) fieldWeight(id:b006m86d in 27636), product of: 1.0 = tf(termFreq(id:b006m86d)=1) 13.473297 = idf(docFreq=2, maxDocs=783800) 1.0 = fieldNorm(field=id, doc=27636) " - b00y1w9h: " 11.437143 = (MATCH) sum of: 11.437143 = (MATCH) max of: 11.437143 = (MATCH) weight(brand_container_id:b006m86d^8.0 in 61), product of: 0.82407516 = queryWeight(brand_container_id:b006m86d^8.0), product of: 8.0 = boost 13.878762 = idf(docFreq=1, maxDocs=783800) 0.007422088 = queryNorm 13.878762 = (MATCH) fieldWeight(brand_container_id:b006m86d in 61), product of: 1.0 = tf(termFreq(brand_container_id:b006m86d)=1) 13.878762 = idf(docFreq=1, maxDocs=783800) 1.0 = fieldNorm(field=brand_container_id, doc=61) " } - QParser: "DisMaxQParser" - altquerystring: null - boostfuncs: null - - timing: { - time: 51 - - prepare: { - time: 6 - - org.apache.solr.handler.component.QueryComponent: { - time: 5 } - - org.apache.solr.handler.component.FacetComponent: { - time: 0 } - - org.apache.solr.handler.component.MoreLikeThisComponent: { - time: 0 } - - org.apache.solr.handler.component.HighlightComponent: { - time: 1 } - - org.apache.solr.handler.component.StatsComponent: { - time: 0 } - - org.apache.solr.handler.component.DebugComponent: { - time: 0 } } - - process: { - time: 45 - - org.apache.solr.handler.component.QueryComponent: { - time: 27 } - - org.apache.solr.handler.component.FacetComponent: { - time: 0 } - - org.apache.solr.handler.component.MoreLikeThisComponent: { - time: 0 } - - org.apache.solr.handler.component.HighlightComponent: { - time: 0 } - - org.apache.solr.handler.component.StatsComponent: { - time: 0 } - - org.apache.solr.handler.component.DebugComponent: { - time: 18 } } } } } On 15 June 2011 13:16, Erick Erickson <erickerick...@gmail.com> wrote: > First off, you didn't "violate groups ettiquette". In fact, yours was > one of the better first posts in terms or providing enough information > for us to actually help! > > A very useful page is the admin/analysis page to see how the > analysis chain works. For instance, if you haven't changed the > field type (i.e. <fieldType name="text">) that your input is > being broken up by WordDelimiterFilterFactory. Be sure to check > the "verbose" checkbox and enter text in both the query and > index boxes! > > Here's an invaluable page, though do note that it's not exhaustive: > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > > > But on to your problem: > > First, boosting isn't absolute, boosting terms just tends to > bubble things up, you have to experiment with various weights.... > > To get the full comparison for both documents you're curious about, > try using "explainOther". see: > > http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_doesn.27t_document_id:juggernaut_appear_in_the_top_10_results_for_my_query > > If you use that against the two docs in question, you should > see (although it's a hard read!) the reason the docs got > their relative scores. > > Finally, your next e-mail hints at what's happening. If you're > putting multiple tokens in some of these fields, the length > normalization may be causing the matches to score lower. You can > try disabling those calculations (omitNorms="true" in your field > definition). > See: > > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr > > String types accept spaces just fine, but you might want to define > the fields with 'multiValued="true" ' and index each as a separate > field (note that won't work with a field that's also your <uniqueKey>). > > Best > Erick > > On Wed, Jun 15, 2011 at 7:16 AM, Judioo <cont...@judioo.com> wrote: > > <dynamicField name="*_id" type="text" indexed="true" > stored="true"/> > > > > so all attributes except 'id' are of type text. > > > > I didn't know that about the string type. So is my problem as described ( > > that partial matches are contributing to the calculation ) and does > defining > > the filed type as string solve this problem. > > > > Or is my understanding completely incorrect? > > > > Thanks in advance > > > > On 15 June 2011 12:08, Ahmet Arslan <iori...@yahoo.com> wrote: > > > >> > > >> > /solr/select/?q=b007vty6&defType=dismax&qf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1&debugQuery=on&fl=id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score&wt=json&indent=on > >> > > >> > > >> > same result ( just higher scores ). It's almost as if > >> > partial matches on > >> > brand|series_container_id and id are being considered in > >> > the 1st document. > >> > Surely this can't be right / expected? > >> > >> What is your fieldType definition? Don't you think it is better to use > >> string type which is not tokenized? > >> > > >