Hi Erick,

On Thu, Sep 9, 2010 at 9:41 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> Could you show us the <fieldType> definitions for your fields? I suspect
> you're not getting the tokens you expect. This will almost certainly
> be true if the type is "string" rather than "text".
>

I should mention that I use solr via the Drupal apachesolr module which
ships with schema.xml and solrconfig.xml files. Here are the ones I use:

http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.32.2.6&view=markup

http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/solrconfig.xml?revision=1.1.2.18.2.6&view=markup


> The solr admin page (especially analysis) will help you a lot here, as
> will adding &debugQuery=on to your query and seeing how the query
> is actually processed. Showing us the results of this will also help.
>

The results below are from my local solr endpoint with the debugQuery option
(e.g.
http://localhost:8983/solr/select/?q=Synuclein&version=2.2&start=0&rows=10&indent=on&debugQuery=on
)

## 1 keyword, numFound="827"

http://localhost:8983/solr/select/?q=Synuclein&version=2.2&start=0&rows=10&indent=on&debugQuery=on

 <str name="rawquerystring">Synuclein</str>
 <str name="querystring">Synuclein</str>
 <str name="parsedquery">+DisjunctionMaxQuery((tags_h1:synuclein^5.0 |
body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 |
tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 |
tags_h2_h3:synuclein^3.0)~0.01)
DisjunctionMaxQuery((body:synuclein^2.0)~0.01)</str>
 <str name="parsedquery_toString">+(tags_h1:synuclein^5.0 |
body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 |
tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 |
tags_h2_h3:synuclein^3.0)~0.01 (body:synuclein^2.0)~0.01</str>

## 2 keywords, numFound="88"

http://localhost:8983/solr/select/?q=Synuclein+animal&version=2.2&start=0&rows=10&indent=on&debugQuery=on

 <str name="rawquerystring">Synuclein animal</str>
 <str name="querystring">Synuclein animal</str>
 <str name="parsedquery">+((DisjunctionMaxQuery((tags_h1:synuclein^5.0 |
body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 |
tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 |
tags_h2_h3:synuclein^3.0)~0.01) DisjunctionMaxQuery((tags_h1:anim^5.0 |
body:anim^40.0 | title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim
| name:anim^3.0 | taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01))~2)
DisjunctionMaxQuery((body:"synuclein anim"~15^2.0)~0.01)</str>
 <str name="parsedquery_toString">+(((tags_h1:synuclein^5.0 |
body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 |
tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 |
tags_h2_h3:synuclein^3.0)~0.01 (tags_h1:anim^5.0 | body:anim^40.0 |
title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim | name:anim^3.0 |
taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01)~2) (body:"synuclein
anim"~15^2.0)~0.01</str>

## 3 keywords, numFound goes up: numFound="265"

http://localhost:8983/solr/select/?q=Synuclein+animal+dopamine&version=2.2&start=0&rows=10&indent=on&debugQuery=on

 <str name="rawquerystring">Synuclein animal dopamine</str>
 <str name="querystring">Synuclein animal dopamine</str>
 <str name="parsedquery">+((DisjunctionMaxQuery((tags_h1:synuclein^5.0 |
body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 |
tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 |
tags_h2_h3:synuclein^3.0)~0.01) DisjunctionMaxQuery((tags_h1:anim^5.0 |
body:anim^40.0 | title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim
| name:anim^3.0 | taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01)
DisjunctionMaxQuery((tags_h1:dopamin^5.0 | body:dopamin^40.0 |
title:dopamin^5.0 | tags_h4_h5_h6:dopamin^2.0 | tags_inline:dopamin |
name:dopamin^3.0 | taxonomy_names:dopamin^2.0 |
tags_h2_h3:dopamin^3.0)~0.01))~2) DisjunctionMaxQuery((body:"synuclein anim
dopamin"~15^2.0)~0.01)</str>
 <str name="parsedquery_toString">+(((tags_h1:synuclein^5.0 |
body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 |
tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 |
tags_h2_h3:synuclein^3.0)~0.01 (tags_h1:anim^5.0 | body:anim^40.0 |
title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim | name:anim^3.0 |
taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01 (tags_h1:dopamin^5.0 |
body:dopamin^40.0 | title:dopamin^5.0 | tags_h4_h5_h6:dopamin^2.0 |
tags_inline:dopamin | name:dopamin^3.0 | taxonomy_names:dopamin^2.0 |
tags_h2_h3:dopamin^3.0)~0.01)~2) (body:"synuclein anim
dopamin"~15^2.0)~0.01</str>

## 4 keywords, numFound="45"

http://localhost:8983/solr/select/?q=Synuclein+animal+dopamine+calcium&version=2.2&start=0&rows=10&indent=on&debugQuery=on

 <str name="rawquerystring">Synuclein animal dopamine calcium</str>
 <str name="querystring">Synuclein animal dopamine calcium</str>
 <str name="parsedquery">+((DisjunctionMaxQuery((tags_h1:synuclein^5.0 |
body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 |
tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 |
tags_h2_h3:synuclein^3.0)~0.01) DisjunctionMaxQuery((tags_h1:anim^5.0 |
body:anim^40.0 | title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim
| name:anim^3.0 | taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01)
DisjunctionMaxQuery((tags_h1:dopamin^5.0 | body:dopamin^40.0 |
title:dopamin^5.0 | tags_h4_h5_h6:dopamin^2.0 | tags_inline:dopamin |
name:dopamin^3.0 | taxonomy_names:dopamin^2.0 |
tags_h2_h3:dopamin^3.0)~0.01) DisjunctionMaxQuery((tags_h1:calcium^5.0 |
body:calcium^40.0 | title:calcium^5.0 | tags_h4_h5_h6:calcium^2.0 |
tags_inline:calcium | name:calcium^3.0 | taxonomy_names:calcium^2.0 |
tags_h2_h3:calcium^3.0)~0.01))~3) DisjunctionMaxQuery((body:"synuclein anim
dopamin calcium"~15^2.0)~0.01)</str>
 <str name="parsedquery_toString">+(((tags_h1:synuclein^5.0 |
body:synuclein^40.0 | title:synuclein^5.0 | tags_h4_h5_h6:synuclein^2.0 |
tags_inline:synuclein | name:synuclein^3.0 | taxonomy_names:synuclein^2.0 |
tags_h2_h3:synuclein^3.0)~0.01 (tags_h1:anim^5.0 | body:anim^40.0 |
title:anim^5.0 | tags_h4_h5_h6:anim^2.0 | tags_inline:anim | name:anim^3.0 |
taxonomy_names:anim^2.0 | tags_h2_h3:anim^3.0)~0.01 (tags_h1:dopamin^5.0 |
body:dopamin^40.0 | title:dopamin^5.0 | tags_h4_h5_h6:dopamin^2.0 |
tags_inline:dopamin | name:dopamin^3.0 | taxonomy_names:dopamin^2.0 |
tags_h2_h3:dopamin^3.0)~0.01 (tags_h1:calcium^5.0 | body:calcium^40.0 |
title:calcium^5.0 | tags_h4_h5_h6:calcium^2.0 | tags_inline:calcium |
name:calcium^3.0 | taxonomy_names:calcium^2.0 |
tags_h2_h3:calcium^3.0)~0.01)~3) (body:"synuclein anim dopamin
calcium"~15^2.0)~0.01</str>

Does that help to figure out the sudden increase of hits when using 3
keywords?

Steph.


>
> Best
> Erick
>
> On Thu, Sep 9, 2010 at 9:34 AM, Stéphane Corlosquet
> <scorlosq...@gmail.com>wrote:
>
> > Hi all,
> >
> > I'm new to solr so please let me know if there is a more appropriate
> place
> > for my question below.
> >
> > I'm noticing a rather unexpected number of results when I add more
> keywords
> > to a search. I'm listing below a example (where I replaced the real
> > keywords
> > with placeholders):
> >
> > keyword1 851 hits
> > keyword1 keyword2  90 hits
> > keyword1 keyword2 keyword3 269 hits
> > keyword1 keyword2 keyword3 keyword4 47 hits
> >
> > As you can see, adding k2 narrows down the amount of results (as I would
> > expect), but adding k3 to k1 and k2 suddenly increases the amount of
> > results. with 4 keywords, the results have been narrowed down again.
> Would
> > solr/lucene search algorithm with multiple keywords explain this non
> > consistent behavior? I would think that adding more keywords would narrow
> > down my results.
> >
> > I'm pasting below the relevant log in case it helps:
> >
> > INFO: [] webapp=/solr path=/select/
> >
> >
> params={spellcheck=true&facet=true&facet.mincount=1&facet.limit=20&spellcheck.q=keyword1+keyword2+keyword3+keyword4&
> > json.nl
> >
> =map&wt=json&version=1.2&rows=10&fl=id,nid,title,comment_count,type,created,changed,score,path,url,uid,name&start=0&facet.sort=true&q=keyword1+keyword2+keyword3+keyword4&bf=recip(rord(created),4,10704,10704)^200.0&facet.field=im_cck_field_author&facet.field=type&facet.field=im_vid_1=&indent=on&start=0&version=2.2&rows=10}
> > hits=10704 status=0 QTime=1
> >
> > any hint on whether this is expected or not appreciated.
> >
> > Steph.
> >
>

Reply via email to