> From: Pooja Verlani <pooja.verl...@gmail.com> > Subject: Phrase stopwords > To: solr-user@lucene.apache.org > Date: Wednesday, September 23, 2009, 1:15 PM > Hi, > Is it possible to have a phrase as a stopword in solr? In > case, please share > how to do so? > > regards, > Pooja >
I think that can be implemented casting/using SynonymFilterFactory and StopFilterFactory. <filter class="solr.SynonymFilterFactory synonyms="syn.txt" ignoreCase="true" expand="false"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> syn.txt will contain lines: phrase as a stopword => somestupidtoken phrase stopword => somestupidtoken three words stopword => somestupidtoken stopwords.txt will contain line: somestupidtoken IMO it will work since SynonymFilterFactory can handle multi-word synonyms like a b c d => foo. With expand="false", you can use this filter to reduce your multi-word stopwords to a single token (that has a low possibility to occur in your docuements). Then remove this single token with StopFilter. This combination will remove multi-word entries in your syn.txt. Hope this helps.