Re: Of, To, and Other Small Words

Aman Tandon Tue, 15 Jul 2014 01:18:07 -0700

Hi jack,


it will use the internal *Lucene hardwired list* of stop words


I am unaware of this, could you please provide the more information about
this.


With Regards
Aman Tandon


On Tue, Jul 15, 2014 at 7:21 AM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> You could try experimenting with CommonGramsFilterFactory and
> CommonGramsQueryFilter (slightly different). There is actually a lot
> of cool analyzers bundled with Solr. You can find full list on my site
> at: http://www.solr-start.com/info/analyzers
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On Tue, Jul 15, 2014 at 8:42 AM, Teague James <teag...@insystechinc.com>
> wrote:
> > Alex,
> >
> > Thanks! Great suggestion. I figured out that it was the
> EdgeNGramFilterFactory. Taking that out of the mix did it.
> >
> > -Teague
> >
> > -----Original Message-----
> > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> > Sent: Monday, July 14, 2014 9:14 PM
> > To: solr-user
> > Subject: Re: Of, To, and Other Small Words
> >
> > Have you tried the Admin UI's Analyze screen. Because it will show you
> what happens to the text as it progresses through the tokenizers and
> filters. No need to reindex.
> >
> > Regards,
> >    Alex.
> > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources:
> http://www.solr-start.com/ and @solrstart Solr popularizers community:
> https://www.linkedin.com/groups?gid=6713853
> >
> >
> > On Tue, Jul 15, 2014 at 8:10 AM, Teague James <teag...@insystechinc.com>
> wrote:
> >> Hi Anshum,
> >>
> >> Thanks for replying and suggesting this, but the field type I am using
> (a modified text_general) in my schema has the file set to 'stopwords.txt'.
> >>
> >>         <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
> >>                 <analyzer type="index">
> >>                         <tokenizer
> class="solr.StandardTokenizerFactory"/>
> >>                         <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt" />
> >>                         <!-- in this example, we will only use synonyms
> at query time
> >>                         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>-->
> >>                         <filter class="solr.LowerCaseFilterFactory"/>
> >>                         <!-- CHANGE: The NGramFilterFactory was added
> to provide partial word search. This can be changed to
> >>                         EdgeNGramFilterFactory side="front" to only
> match front sided partial searches if matching any
> >>                         part of a word is undesireable.-->
> >>                         <filter class="solr.NGramFilterFactory"
> minGramSize="3" maxGramSize="10" />
> >>                         <!-- CHANGE: The PorterStemFilterFactory was
> added to allow matches for 'cat' and 'cats' by searching for 'cat' -->
> >>                         <filter class="solr.PorterStemFilterFactory"/>
> >>                 </analyzer>
> >>                 <analyzer type="query">
> >>                         <tokenizer
> class="solr.StandardTokenizerFactory"/>
> >>                         <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt" />
> >>                         <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >>                         <filter class="solr.LowerCaseFilterFactory"/>
> >>                         <!-- CHANGE: The PorterStemFilterFactory was
> added to allow matches for 'cat' and 'cats' by searching for 'cat' -->
> >>                         <filter class="solr.PorterStemFilterFactory"/>
> >>                 </analyzer>
> >>         </fieldType>
> >>
> >> Just to be double sure I cleared the list in stopwords_en.txt,
> restarted Solr, re-indexed, and searched with still zero results. Any other
> suggestions on where I might be able to control this behavior?
> >>
> >> -Teague
> >>
> >>
> >> -----Original Message-----
> >> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> >> Sent: Monday, July 14, 2014 4:04 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Of, To, and Other Small Words
> >>
> >> Hi Teague,
> >>
> >> The StopFilterFactory (which I think you're using) by default uses
> lang/stopwords_en.txt (which wouldn't be empty if you check).
> >> What you're looking at is the stopword.txt. You could either empty that
> file out or change the field type for your field.
> >>
> >>
> >> On Mon, Jul 14, 2014 at 12:53 PM, Teague James <
> teag...@insystechinc.com> wrote:
> >>> Hello all,
> >>>
> >>> I am working with Solr 4.9.0 and am searching for phrases that
> >>> contain words like "of" or "to" that Solr seems to be ignoring at
> index time.
> >>> Here's what I tried:
> >>>
> >>> curl http://localhost/solr/update?commit=true -H "Content-Type:
> text/xml"
> >>> --data-binary '<add><doc><field name="id">100</field><field
> >>> name="content">blah blah blah knowledge of science blah blah
> >>> blah</field></doc></add>'
> >>>
> >>> Then, using a broswer:
> >>>
> >>> http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=
> >>> i
> >>> d:100
> >>>
> >>> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
> >>> "knowledge of" or "of science" and I get zero hits. I don't want to
> >>> use proximity if I can avoid it, as this may introduce too many
> >>> undesireable results. Stopwords.txt is blank, yet clearly Solr is
> ignoring "of" and "to"
> >>> and possibly more words that I have not discovered through testing
> >>> yet. Is there some other configuration file that contains these small
> >>> words? Is there any way to force Solr to pay attention to them and
> >>> not drop them from the phrase? Any advice is appreciated! Thanks!
> >>>
> >>> -Teague
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Anshum Gupta
> >> http://www.anshumgupta.net
> >>
> >
>

Re: Of, To, and Other Small Words

Reply via email to