Ok. I am kind a lost now. If I open up the console > analysis and perform it, that's the final result.
Your suggestion is: get rid of the <filter stopword.txt> in the schema.xml and during index phase replaceAll("in stopwords.txt"," ") then add to solr. Is that correct ? Thanks David > On 5 Nov 2019, at 14:48, David Hastings <hastings.recurs...@gmail.com> wrote: > > Fwd to another server > > no, > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > > is still using stopwords and should be removed, in my opinion of course, > based on your use case may be different, but i generally axe any reference > to them at all > > On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri <gvit...@ebi.ac.uk> wrote: > >> Thanks. >> Haven't I done this here ? >> <fieldType name="text_field" class="solr.TextField" >> positionIncrementGap="100" omitNorms="false" > >> <analyzer type="index"> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> <filter class="solr.ClassicFilterFactory"/> >> <filter class="solr.LengthFilterFactory" min="2" max="20"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt"/> >> </analyzer> >> >> >>> On 5 Nov 2019, at 14:15, David Hastings <hastings.recurs...@gmail.com> >> wrote: >>> >>> Fwd to another server >>> >>> The first thing you should do is remove any reference to stop words and >>> never use them, then re-index your data and try it again. >>> >>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <gvit...@ebi.ac.uk> >> wrote: >>> >>>> Hi, >>>> >>>> I am performing a search to match a name (text_field), however this term >>>> contains 'and' and 'a' and it doesn't return any records. If i remove >> 'a' >>>> then it works. >>>> e.g >>>> Search Term: lymphoid and a non-lymphoid cell >>>> doesn't work: >>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>> < >>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>> >>>> >>>> Search term: lymphoid and non-lymphoid cell >>>> works: >>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>> < >>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>> >>>> interested in the first result >>>> >>>> schema.xml >>>> <field name="name" type="text_field" >>>> indexed="true" stored="true" omitNorms="false" required="true" >>>> multiValued="false"/> >>>> >>>> <analyzer type="query"> >>>> <tokenizer class="solr.PatternTokenizerFactory" >>>> pattern="[^a-zA-Z0-9/._:]"/> >>>> <filter class="solr.PatternReplaceFilterFactory" >>>> pattern="^[/._:]+" replacement=""/> >>>> <filter class="solr.PatternReplaceFilterFactory" >>>> pattern="[/._:]+$" replacement=""/> >>>> <filter class="solr.PatternReplaceFilterFactory" >>>> pattern="[_]" replacement=" "/> >>>> <filter class="solr.LengthFilterFactory" min="2" >> max="20"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>> words="stopwords.txt"/> >>>> </analyzer> >>>> >>>> <fieldType name="text_field" class="solr.TextField" >>>> positionIncrementGap="100" omitNorms="false" > >>>> <analyzer type="index"> >>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>> <filter class="solr.ClassicFilterFactory"/> >>>> <filter class="solr.LengthFilterFactory" min="2" >> max="20"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>> words="stopwords.txt"/> >>>> </analyzer> >>>> <analyzer type="query"> >>>> <tokenizer class="solr.PatternTokenizerFactory" >>>> pattern="[^a-zA-Z0-9/._:]"/> >>>> <filter class="solr.PatternReplaceFilterFactory" >>>> pattern="^[/._:]+" replacement=""/> >>>> <filter class="solr.PatternReplaceFilterFactory" >>>> pattern="[/._:]+$" replacement=""/> >>>> <filter class="solr.PatternReplaceFilterFactory" >>>> pattern="[_]" replacement=" "/> >>>> <filter class="solr.LengthFilterFactory" min="2" >> max="20"/> >>>> <filter class="solr.LowerCaseFilterFactory"/> >>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>> words="stopwords.txt"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> stopwords.txt >>>> #Standard english stop words taken from Lucene's StopAnalyzer >>>> a >>>> b >>>> c >>>> .... >>>> an >>>> and >>>> are >>>> >>>> Running SolR 6.6.2. >>>> >>>> Is there anything I could do to prevent this ? >>>> >>>> Thanks >>>> Guilherme >> >>