Hi jack,
it will use the internal *Lucene hardwired list* of stop words I am unaware of this, could you please provide the more information about this. With Regards Aman Tandon On Tue, Jul 15, 2014 at 7:21 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > You could try experimenting with CommonGramsFilterFactory and > CommonGramsQueryFilter (slightly different). There is actually a lot > of cool analyzers bundled with Solr. You can find full list on my site > at: http://www.solr-start.com/info/analyzers > > Regards, > Alex. > Personal: http://www.outerthoughts.com/ and @arafalov > Solr resources: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 > > > On Tue, Jul 15, 2014 at 8:42 AM, Teague James <teag...@insystechinc.com> > wrote: > > Alex, > > > > Thanks! Great suggestion. I figured out that it was the > EdgeNGramFilterFactory. Taking that out of the mix did it. > > > > -Teague > > > > -----Original Message----- > > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > > Sent: Monday, July 14, 2014 9:14 PM > > To: solr-user > > Subject: Re: Of, To, and Other Small Words > > > > Have you tried the Admin UI's Analyze screen. Because it will show you > what happens to the text as it progresses through the tokenizers and > filters. No need to reindex. > > > > Regards, > > Alex. > > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: > http://www.solr-start.com/ and @solrstart Solr popularizers community: > https://www.linkedin.com/groups?gid=6713853 > > > > > > On Tue, Jul 15, 2014 at 8:10 AM, Teague James <teag...@insystechinc.com> > wrote: > >> Hi Anshum, > >> > >> Thanks for replying and suggesting this, but the field type I am using > (a modified text_general) in my schema has the file set to 'stopwords.txt'. > >> > >> <fieldType name="text_general" class="solr.TextField" > positionIncrementGap="100"> > >> <analyzer type="index"> > >> <tokenizer > class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.StopFilterFactory" > ignoreCase="true" words="stopwords.txt" /> > >> <!-- in this example, we will only use synonyms > at query time > >> <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>--> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <!-- CHANGE: The NGramFilterFactory was added > to provide partial word search. This can be changed to > >> EdgeNGramFilterFactory side="front" to only > match front sided partial searches if matching any > >> part of a word is undesireable.--> > >> <filter class="solr.NGramFilterFactory" > minGramSize="3" maxGramSize="10" /> > >> <!-- CHANGE: The PorterStemFilterFactory was > added to allow matches for 'cat' and 'cats' by searching for 'cat' --> > >> <filter class="solr.PorterStemFilterFactory"/> > >> </analyzer> > >> <analyzer type="query"> > >> <tokenizer > class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.StopFilterFactory" > ignoreCase="true" words="stopwords.txt" /> > >> <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <!-- CHANGE: The PorterStemFilterFactory was > added to allow matches for 'cat' and 'cats' by searching for 'cat' --> > >> <filter class="solr.PorterStemFilterFactory"/> > >> </analyzer> > >> </fieldType> > >> > >> Just to be double sure I cleared the list in stopwords_en.txt, > restarted Solr, re-indexed, and searched with still zero results. Any other > suggestions on where I might be able to control this behavior? > >> > >> -Teague > >> > >> > >> -----Original Message----- > >> From: Anshum Gupta [mailto:ans...@anshumgupta.net] > >> Sent: Monday, July 14, 2014 4:04 PM > >> To: solr-user@lucene.apache.org > >> Subject: Re: Of, To, and Other Small Words > >> > >> Hi Teague, > >> > >> The StopFilterFactory (which I think you're using) by default uses > lang/stopwords_en.txt (which wouldn't be empty if you check). > >> What you're looking at is the stopword.txt. You could either empty that > file out or change the field type for your field. > >> > >> > >> On Mon, Jul 14, 2014 at 12:53 PM, Teague James < > teag...@insystechinc.com> wrote: > >>> Hello all, > >>> > >>> I am working with Solr 4.9.0 and am searching for phrases that > >>> contain words like "of" or "to" that Solr seems to be ignoring at > index time. > >>> Here's what I tried: > >>> > >>> curl http://localhost/solr/update?commit=true -H "Content-Type: > text/xml" > >>> --data-binary '<add><doc><field name="id">100</field><field > >>> name="content">blah blah blah knowledge of science blah blah > >>> blah</field></doc></add>' > >>> > >>> Then, using a broswer: > >>> > >>> http://localhost/solr/collection1/select?q="knowledge+of+science"&fq= > >>> i > >>> d:100 > >>> > >>> I get zero hits. Search for "knowledge" or "science" and I'll get hits. > >>> "knowledge of" or "of science" and I get zero hits. I don't want to > >>> use proximity if I can avoid it, as this may introduce too many > >>> undesireable results. Stopwords.txt is blank, yet clearly Solr is > ignoring "of" and "to" > >>> and possibly more words that I have not discovered through testing > >>> yet. Is there some other configuration file that contains these small > >>> words? Is there any way to force Solr to pay attention to them and > >>> not drop them from the phrase? Any advice is appreciated! Thanks! > >>> > >>> -Teague > >>> > >>> > >> > >> > >> > >> -- > >> > >> Anshum Gupta > >> http://www.anshumgupta.net > >> > > >