> From: Pooja Verlani <pooja.verl...@gmail.com>
> Subject: Phrase stopwords
> To: solr-user@lucene.apache.org
> Date: Wednesday, September 23, 2009, 1:15 PM
> Hi,
> Is it possible to have a phrase as a stopword in solr? In
> case, please share
> how to do so?
> 
> regards,
> Pooja
> 

I think that can be implemented casting/using SynonymFilterFactory and 
StopFilterFactory.

<filter class="solr.SynonymFilterFactory synonyms="syn.txt" ignoreCase="true" 
expand="false"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>

syn.txt will contain lines:

phrase as a stopword => somestupidtoken
phrase stopword => somestupidtoken
three words stopword => somestupidtoken

stopwords.txt will contain line:
somestupidtoken

IMO it will work since SynonymFilterFactory can handle multi-word synonyms like 
a b c d => foo. With expand="false", you can use this filter to reduce your 
multi-word stopwords to a single token (that has a low possibility to occur in 
your docuements). Then remove this single token with StopFilter.
This combination will remove multi-word entries in your syn.txt.

Hope this helps.



      

Reply via email to