Hi Walter, The solr.StopFilter removes all tokens that are stopwords. Those words will > not be in the index, so they can never match a query.
I think the OP's concern is different results when adding a stopword. I think he's using the filter factory correctly - the query chain includes the filter as well so it should remove "a" while querying. *@Guilherme*, please post results for both the query, the document in result you are concerned about and post full result of analysis screen (for both query and index). On Tue, 5 Nov 2019 at 21:38, Walter Underwood <wun...@wunderwood.org> wrote: > No. > > The solr.StopFilter removes all tokens that are stopwords. Those words > will not be in the index, so they can never match a query. > > 1. Remove the lines with solr.StopFilter from every analysis chain in > schema.xml. > 2. Reload the collection, restart Solr, or whatever to read the new config. > 3. Reindex all of the documents. > > When indexed with the new analysis chain, the stopwords will not be > removed and they will be searchable. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Nov 5, 2019, at 8:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: > > > > Ok. I am kind a lost now. > > If I open up the console > analysis and perform it, that's the final > result. > > <Screenshot 2019-11-05 at 14.54.16.png> > > > > Your suggestion is: get rid of the <filter stopword.txt> in the > schema.xml and during index phase replaceAll("in stopwords.txt"," ") then > add to solr. Is that correct ? > > > > Thanks David > > > >> On 5 Nov 2019, at 14:48, David Hastings <hastings.recurs...@gmail.com > <mailto:hastings.recurs...@gmail.com>> wrote: > >> > >> Fwd to another server > >> > >> no, > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > >> words="stopwords.txt"/> > >> > >> is still using stopwords and should be removed, in my opinion of course, > >> based on your use case may be different, but i generally axe any > reference > >> to them at all > >> > >> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri <gvit...@ebi.ac.uk > <mailto:gvit...@ebi.ac.uk>> wrote: > >> > >>> Thanks. > >>> Haven't I done this here ? > >>> <fieldType name="text_field" class="solr.TextField" > >>> positionIncrementGap="100" omitNorms="false" > > >>> <analyzer type="index"> > >>> <tokenizer class="solr.StandardTokenizerFactory"/> > >>> <filter class="solr.ClassicFilterFactory"/> > >>> <filter class="solr.LengthFilterFactory" min="2" > max="20"/> > >>> <filter class="solr.LowerCaseFilterFactory"/> > >>> <filter class="solr.StopFilterFactory" ignoreCase="true" > >>> words="stopwords.txt"/> > >>> </analyzer> > >>> > >>> > >>>> On 5 Nov 2019, at 14:15, David Hastings <hastings.recurs...@gmail.com > <mailto:hastings.recurs...@gmail.com>> > >>> wrote: > >>>> > >>>> Fwd to another server > >>>> > >>>> The first thing you should do is remove any reference to stop words > and > >>>> never use them, then re-index your data and try it again. > >>>> > >>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <gvit...@ebi.ac.uk > <mailto:gvit...@ebi.ac.uk>> > >>> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I am performing a search to match a name (text_field), however this > term > >>>>> contains 'and' and 'a' and it doesn't return any records. If i remove > >>> 'a' > >>>>> then it works. > >>>>> e.g > >>>>> Search Term: lymphoid and a non-lymphoid cell > >>>>> doesn't work: > >>>>> > >>> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > < > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > > >>>>> < > >>>>> > >>> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>>>>> > >>>>> > >>>>> Search term: lymphoid and non-lymphoid cell > >>>>> works: > >>>>> > >>> > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>>>> < > >>>>> > >>> > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>>>>> > >>>>> interested in the first result > >>>>> > >>>>> schema.xml > >>>>> <field name="name" type="text_field" > >>>>> indexed="true" stored="true" omitNorms="false" required="true" > >>>>> multiValued="false"/> > >>>>> > >>>>> <analyzer type="query"> > >>>>> <tokenizer class="solr.PatternTokenizerFactory" > >>>>> pattern="[^a-zA-Z0-9/._:]"/> > >>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>> pattern="^[/._:]+" replacement=""/> > >>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>> pattern="[/._:]+$" replacement=""/> > >>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>> pattern="[_]" replacement=" "/> > >>>>> <filter class="solr.LengthFilterFactory" min="2" > >>> max="20"/> > >>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>> <filter class="solr.StopFilterFactory" > ignoreCase="true" > >>>>> words="stopwords.txt"/> > >>>>> </analyzer> > >>>>> > >>>>> <fieldType name="text_field" class="solr.TextField" > >>>>> positionIncrementGap="100" omitNorms="false" > > >>>>> <analyzer type="index"> > >>>>> <tokenizer class="solr.StandardTokenizerFactory"/> > >>>>> <filter class="solr.ClassicFilterFactory"/> > >>>>> <filter class="solr.LengthFilterFactory" min="2" > >>> max="20"/> > >>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>> <filter class="solr.StopFilterFactory" > ignoreCase="true" > >>>>> words="stopwords.txt"/> > >>>>> </analyzer> > >>>>> <analyzer type="query"> > >>>>> <tokenizer class="solr.PatternTokenizerFactory" > >>>>> pattern="[^a-zA-Z0-9/._:]"/> > >>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>> pattern="^[/._:]+" replacement=""/> > >>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>> pattern="[/._:]+$" replacement=""/> > >>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>> pattern="[_]" replacement=" "/> > >>>>> <filter class="solr.LengthFilterFactory" min="2" > >>> max="20"/> > >>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>> <filter class="solr.StopFilterFactory" > ignoreCase="true" > >>>>> words="stopwords.txt"/> > >>>>> </analyzer> > >>>>> </fieldType> > >>>>> > >>>>> stopwords.txt > >>>>> #Standard english stop words taken from Lucene's StopAnalyzer > >>>>> a > >>>>> b > >>>>> c > >>>>> .... > >>>>> an > >>>>> and > >>>>> are > >>>>> > >>>>> Running SolR 6.6.2. > >>>>> > >>>>> Is there anything I could do to prevent this ? > >>>>> > >>>>> Thanks > >>>>> Guilherme > >>> > >>> > > > > -- -- Regards, *Paras Lehana* [65871] Development Engineer, Auto-Suggest, IndiaMART Intermesh Ltd. 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, Noida, UP, IN - 201303 Mob.: +91-9560911996 Work: 01203916600 | Extn: *8173* -- IMPORTANT: NEVER share your IndiaMART OTP/ Password with anyone.