Hi Erick,
On Thu, Sep 9, 2010 at 9:41 AM, Erick Erickson <erickerick...@gmail.com>wrote: > Could you show us the <fieldType> definitions for your fields? I suspect > you're not getting the tokens you expect. This will almost certainly > be true if the type is "string" rather than "text". > I should mention that I use solr via the Drupal apachesolr module which ships with schema.xml and solrconfig.xml files. Here are the ones I use: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.32.2.6&view=markup http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/solrconfig.xml?revision=1.1.2.18.2.6&view=markup > The solr admin page (especially analysis) will help you a lot here, as > will adding &debugQuery=on to your query and seeing how the query > is actually processed. Showing us the results of this will also help. > The results below are from my local solr endpoint with the debugQuery option (e.g. http://localhost:8983/solr/select/?q=Synuclein&version=2.2&start=0&rows=10&indent=on&debugQuery=on ) ## 1 keyword, numFound="827" http://localhost:8983/solr/select/?q=Synuclein&version=2.2&start=0&rows=10&indent=on&debugQuery=on <str name="rawquerystring">Synuclein</str> <str name="querystring">Synuclein</str> <str name="parsedquery">+DisjunctionMaxQuery((tags_h1:synuclein^5.0 | body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 | tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 | tags_h2_h3:synuclein^3.0)~0.01) DisjunctionMaxQuery((body:synuclein^2.0)~0.01)</str> <str name="parsedquery_toString">+(tags_h1:synuclein^5.0 | body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 | tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 | tags_h2_h3:synuclein^3.0)~0.01 (body:synuclein^2.0)~0.01</str> ## 2 keywords, numFound="88" http://localhost:8983/solr/select/?q=Synuclein+animal&version=2.2&start=0&rows=10&indent=on&debugQuery=on <str name="rawquerystring">Synuclein animal</str> <str name="querystring">Synuclein animal</str> <str name="parsedquery">+((DisjunctionMaxQuery((tags_h1:synuclein^5.0 | body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 | tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 | tags_h2_h3:synuclein^3.0)~0.01) DisjunctionMaxQuery((tags_h1:anim^5.0 | body:anim^40.0 | title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim | name:anim^3.0 | taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01))~2) DisjunctionMaxQuery((body:"synuclein anim"~15^2.0)~0.01)</str> <str name="parsedquery_toString">+(((tags_h1:synuclein^5.0 | body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 | tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 | tags_h2_h3:synuclein^3.0)~0.01 (tags_h1:anim^5.0 | body:anim^40.0 | title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim | name:anim^3.0 | taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01)~2) (body:"synuclein anim"~15^2.0)~0.01</str> ## 3 keywords, numFound goes up: numFound="265" http://localhost:8983/solr/select/?q=Synuclein+animal+dopamine&version=2.2&start=0&rows=10&indent=on&debugQuery=on <str name="rawquerystring">Synuclein animal dopamine</str> <str name="querystring">Synuclein animal dopamine</str> <str name="parsedquery">+((DisjunctionMaxQuery((tags_h1:synuclein^5.0 | body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 | tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 | tags_h2_h3:synuclein^3.0)~0.01) DisjunctionMaxQuery((tags_h1:anim^5.0 | body:anim^40.0 | title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim | name:anim^3.0 | taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01) DisjunctionMaxQuery((tags_h1:dopamin^5.0 | body:dopamin^40.0 | title:dopamin^5.0 | tags_h4_h5_h6:dopamin^2.0 | tags_inline:dopamin | name:dopamin^3.0 | taxonomy_names:dopamin^2.0 | tags_h2_h3:dopamin^3.0)~0.01))~2) DisjunctionMaxQuery((body:"synuclein anim dopamin"~15^2.0)~0.01)</str> <str name="parsedquery_toString">+(((tags_h1:synuclein^5.0 | body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 | tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 | tags_h2_h3:synuclein^3.0)~0.01 (tags_h1:anim^5.0 | body:anim^40.0 | title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim | name:anim^3.0 | taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01 (tags_h1:dopamin^5.0 | body:dopamin^40.0 | title:dopamin^5.0 | tags_h4_h5_h6:dopamin^2.0 | tags_inline:dopamin | name:dopamin^3.0 | taxonomy_names:dopamin^2.0 | tags_h2_h3:dopamin^3.0)~0.01)~2) (body:"synuclein anim dopamin"~15^2.0)~0.01</str> ## 4 keywords, numFound="45" http://localhost:8983/solr/select/?q=Synuclein+animal+dopamine+calcium&version=2.2&start=0&rows=10&indent=on&debugQuery=on <str name="rawquerystring">Synuclein animal dopamine calcium</str> <str name="querystring">Synuclein animal dopamine calcium</str> <str name="parsedquery">+((DisjunctionMaxQuery((tags_h1:synuclein^5.0 | body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 | tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 | tags_h2_h3:synuclein^3.0)~0.01) DisjunctionMaxQuery((tags_h1:anim^5.0 | body:anim^40.0 | title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim | name:anim^3.0 | taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01) DisjunctionMaxQuery((tags_h1:dopamin^5.0 | body:dopamin^40.0 | title:dopamin^5.0 | tags_h4_h5_h6:dopamin^2.0 | tags_inline:dopamin | name:dopamin^3.0 | taxonomy_names:dopamin^2.0 | tags_h2_h3:dopamin^3.0)~0.01) DisjunctionMaxQuery((tags_h1:calcium^5.0 | body:calcium^40.0 | title:calcium^5.0 | tags_h4_h5_h6:calcium^2.0 | tags_inline:calcium | name:calcium^3.0 | taxonomy_names:calcium^2.0 | tags_h2_h3:calcium^3.0)~0.01))~3) DisjunctionMaxQuery((body:"synuclein anim dopamin calcium"~15^2.0)~0.01)</str> <str name="parsedquery_toString">+(((tags_h1:synuclein^5.0 | body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 | tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 | tags_h2_h3:synuclein^3.0)~0.01 (tags_h1:anim^5.0 | body:anim^40.0 | title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim | name:anim^3.0 | taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01 (tags_h1:dopamin^5.0 | body:dopamin^40.0 | title:dopamin^5.0 | tags_h4_h5_h6:dopamin^2.0 | tags_inline:dopamin | name:dopamin^3.0 | taxonomy_names:dopamin^2.0 | tags_h2_h3:dopamin^3.0)~0.01 (tags_h1:calcium^5.0 | body:calcium^40.0 | title:calcium^5.0 | tags_h4_h5_h6:calcium^2.0 | tags_inline:calcium | name:calcium^3.0 | taxonomy_names:calcium^2.0 | tags_h2_h3:calcium^3.0)~0.01)~3) (body:"synuclein anim dopamin calcium"~15^2.0)~0.01</str> Does that help to figure out the sudden increase of hits when using 3 keywords? Steph. > > Best > Erick > > On Thu, Sep 9, 2010 at 9:34 AM, Stéphane Corlosquet > <scorlosq...@gmail.com>wrote: > > > Hi all, > > > > I'm new to solr so please let me know if there is a more appropriate > place > > for my question below. > > > > I'm noticing a rather unexpected number of results when I add more > keywords > > to a search. I'm listing below a example (where I replaced the real > > keywords > > with placeholders): > > > > keyword1 851 hits > > keyword1 keyword2 90 hits > > keyword1 keyword2 keyword3 269 hits > > keyword1 keyword2 keyword3 keyword4 47 hits > > > > As you can see, adding k2 narrows down the amount of results (as I would > > expect), but adding k3 to k1 and k2 suddenly increases the amount of > > results. with 4 keywords, the results have been narrowed down again. > Would > > solr/lucene search algorithm with multiple keywords explain this non > > consistent behavior? I would think that adding more keywords would narrow > > down my results. > > > > I'm pasting below the relevant log in case it helps: > > > > INFO: [] webapp=/solr path=/select/ > > > > > params={spellcheck=true&facet=true&facet.mincount=1&facet.limit=20&spellcheck.q=keyword1+keyword2+keyword3+keyword4& > > json.nl > > > =map&wt=json&version=1.2&rows=10&fl=id,nid,title,comment_count,type,created,changed,score,path,url,uid,name&start=0&facet.sort=true&q=keyword1+keyword2+keyword3+keyword4&bf=recip(rord(created),4,10704,10704)^200.0&facet.field=im_cck_field_author&facet.field=type&facet.field=im_vid_1=&indent=on&start=0&version=2.2&rows=10} > > hits=10704 status=0 QTime=1 > > > > any hint on whether this is expected or not appreciated. > > > > Steph. > > >