Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Guilherme Viteri Tue, 05 Nov 2019 06:57:44 -0800

Ok. I am kind a lost now.
If I open up the console > analysis and perform it, that's the final result.


Your suggestion is: get rid of the <filter stopword.txt> in the schema.xml and 
during index phase replaceAll("in stopwords.txt"," ") then add to solr. Is that 
correct ?

Thanks David

> On 5 Nov 2019, at 14:48, David Hastings <hastings.recurs...@gmail.com> wrote:
> 
> Fwd to another server
> 
> no,
>               <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
> 
> is still using stopwords and should be removed, in my opinion of course,
> based on your use case may be different, but i generally axe any reference
> to them at all
> 
> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri <gvit...@ebi.ac.uk> wrote:
> 
>> Thanks.
>> Haven't I done this here ?
>>  <fieldType name="text_field" class="solr.TextField"
>> positionIncrementGap="100" omitNorms="false" >
>>           <analyzer type="index">
>>               <tokenizer class="solr.StandardTokenizerFactory"/>
>>               <filter class="solr.ClassicFilterFactory"/>
>>               <filter class="solr.LengthFilterFactory" min="2" max="20"/>
>>               <filter class="solr.LowerCaseFilterFactory"/>
>>               <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt"/>
>>           </analyzer>
>> 
>> 
>>> On 5 Nov 2019, at 14:15, David Hastings <hastings.recurs...@gmail.com>
>> wrote:
>>> 
>>> Fwd to another server
>>> 
>>> The first thing you should do is remove any reference to stop words and
>>> never use them, then re-index your data and try it again.
>>> 
>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <gvit...@ebi.ac.uk>
>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am performing a search to match a name (text_field), however this term
>>>> contains 'and' and 'a' and it doesn't return any records. If i remove
>> 'a'
>>>> then it works.
>>>> e.g
>>>> Search Term: lymphoid and a non-lymphoid cell
>>>> doesn't work:
>>>> 
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>> <
>>>> 
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>>> 
>>>> 
>>>> Search term: lymphoid and non-lymphoid cell
>>>> works:
>>>> 
>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>> <
>>>> 
>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>>> 
>>>> interested in the first result
>>>> 
>>>> schema.xml
>>>> <field name="name"                          type="text_field"
>>>> indexed="true"  stored="true"   omitNorms="false"   required="true"
>>>> multiValued="false"/>
>>>> 
>>>>           <analyzer type="query">
>>>>               <tokenizer class="solr.PatternTokenizerFactory"
>>>> pattern="[^a-zA-Z0-9/._:]"/>
>>>>               <filter class="solr.PatternReplaceFilterFactory"
>>>> pattern="^[/._:]+" replacement=""/>
>>>>               <filter class="solr.PatternReplaceFilterFactory"
>>>> pattern="[/._:]+$" replacement=""/>
>>>>               <filter class="solr.PatternReplaceFilterFactory"
>>>> pattern="[_]" replacement=" "/>
>>>>               <filter class="solr.LengthFilterFactory" min="2"
>> max="20"/>
>>>>               <filter class="solr.LowerCaseFilterFactory"/>
>>>>               <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>           </analyzer>
>>>> 
>>>>       <fieldType name="text_field" class="solr.TextField"
>>>> positionIncrementGap="100" omitNorms="false" >
>>>>           <analyzer type="index">
>>>>               <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>               <filter class="solr.ClassicFilterFactory"/>
>>>>               <filter class="solr.LengthFilterFactory" min="2"
>> max="20"/>
>>>>               <filter class="solr.LowerCaseFilterFactory"/>
>>>>               <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>           </analyzer>
>>>>           <analyzer type="query">
>>>>               <tokenizer class="solr.PatternTokenizerFactory"
>>>> pattern="[^a-zA-Z0-9/._:]"/>
>>>>               <filter class="solr.PatternReplaceFilterFactory"
>>>> pattern="^[/._:]+" replacement=""/>
>>>>               <filter class="solr.PatternReplaceFilterFactory"
>>>> pattern="[/._:]+$" replacement=""/>
>>>>               <filter class="solr.PatternReplaceFilterFactory"
>>>> pattern="[_]" replacement=" "/>
>>>>               <filter class="solr.LengthFilterFactory" min="2"
>> max="20"/>
>>>>               <filter class="solr.LowerCaseFilterFactory"/>
>>>>               <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>           </analyzer>
>>>>       </fieldType>
>>>> 
>>>> stopwords.txt
>>>> #Standard english stop words taken from Lucene's StopAnalyzer
>>>> a
>>>> b
>>>> c
>>>> ....
>>>> an
>>>> and
>>>> are
>>>> 
>>>> Running SolR 6.6.2.
>>>> 
>>>> Is there anything I could do to prevent this ?
>>>> 
>>>> Thanks
>>>> Guilherme
>> 
>>

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Reply via email to