Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

David Hastings Tue, 05 Nov 2019 06:49:34 -0800

no,
               <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>


is still using stopwords and should be removed, in my opinion of course,
based on your use case may be different, but i generally axe any reference
to them at all

On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri <gvit...@ebi.ac.uk> wrote:

> Thanks.
> Haven't I done this here ?
>   <fieldType name="text_field" class="solr.TextField"
> positionIncrementGap="100" omitNorms="false" >
>            <analyzer type="index">
>                <tokenizer class="solr.StandardTokenizerFactory"/>
>                <filter class="solr.ClassicFilterFactory"/>
>                <filter class="solr.LengthFilterFactory" min="2" max="20"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>            </analyzer>
>
>
> > On 5 Nov 2019, at 14:15, David Hastings <hastings.recurs...@gmail.com>
> wrote:
> >
> > Fwd to another server
> >
> > The first thing you should do is remove any reference to stop words and
> > never use them, then re-index your data and try it again.
> >
> > On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <gvit...@ebi.ac.uk>
> wrote:
> >
> >> Hi,
> >>
> >> I am performing a search to match a name (text_field), however this term
> >> contains 'and' and 'a' and it doesn't return any records. If i remove
> 'a'
> >> then it works.
> >> e.g
> >> Search Term: lymphoid and a non-lymphoid cell
> >> doesn't work:
> >>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >> <
> >>
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >>>
> >>
> >> Search term: lymphoid and non-lymphoid cell
> >> works:
> >>
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >> <
> >>
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >>>
> >> interested in the first result
> >>
> >> schema.xml
> >> <field name="name"                          type="text_field"
> >> indexed="true"  stored="true"   omitNorms="false"   required="true"
> >> multiValued="false"/>
> >>
> >>            <analyzer type="query">
> >>                <tokenizer class="solr.PatternTokenizerFactory"
> >> pattern="[^a-zA-Z0-9/._:]"/>
> >>                <filter class="solr.PatternReplaceFilterFactory"
> >> pattern="^[/._:]+" replacement=""/>
> >>                <filter class="solr.PatternReplaceFilterFactory"
> >> pattern="[/._:]+$" replacement=""/>
> >>                <filter class="solr.PatternReplaceFilterFactory"
> >> pattern="[_]" replacement=" "/>
> >>                <filter class="solr.LengthFilterFactory" min="2"
> max="20"/>
> >>                <filter class="solr.LowerCaseFilterFactory"/>
> >>                <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="stopwords.txt"/>
> >>            </analyzer>
> >>
> >>        <fieldType name="text_field" class="solr.TextField"
> >> positionIncrementGap="100" omitNorms="false" >
> >>            <analyzer type="index">
> >>                <tokenizer class="solr.StandardTokenizerFactory"/>
> >>                <filter class="solr.ClassicFilterFactory"/>
> >>                <filter class="solr.LengthFilterFactory" min="2"
> max="20"/>
> >>                <filter class="solr.LowerCaseFilterFactory"/>
> >>                <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="stopwords.txt"/>
> >>            </analyzer>
> >>            <analyzer type="query">
> >>                <tokenizer class="solr.PatternTokenizerFactory"
> >> pattern="[^a-zA-Z0-9/._:]"/>
> >>                <filter class="solr.PatternReplaceFilterFactory"
> >> pattern="^[/._:]+" replacement=""/>
> >>                <filter class="solr.PatternReplaceFilterFactory"
> >> pattern="[/._:]+$" replacement=""/>
> >>                <filter class="solr.PatternReplaceFilterFactory"
> >> pattern="[_]" replacement=" "/>
> >>                <filter class="solr.LengthFilterFactory" min="2"
> max="20"/>
> >>                <filter class="solr.LowerCaseFilterFactory"/>
> >>                <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="stopwords.txt"/>
> >>            </analyzer>
> >>        </fieldType>
> >>
> >> stopwords.txt
> >> #Standard english stop words taken from Lucene's StopAnalyzer
> >> a
> >> b
> >> c
> >> ....
> >> an
> >> and
> >> are
> >>
> >> Running SolR 6.6.2.
> >>
> >> Is there anything I could do to prevent this ?
> >>
> >> Thanks
> >> Guilherme
>
>

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Reply via email to