When I look at the text_de fieldType provided in the example schema i can see:
> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_de.txt" format="snowball" > enablePositionIncrements="true"/> > <filter class="solr.GermanNormalizationFilterFactory"/> > <filter class="solr.GermanLightStemFilterFactory"/> I have tried with this and this removed the words with Umlaute. It seems, that is because of format="snowball". I haven't used this, because I though I had one word per line. But maybe some invisible characters got into my stopword file and destroyed it. Thanks. Daniel On Thu, Nov 8, 2012 at 10:36 AM, Daniel Brügge < daniel.brue...@googlemail.com> wrote: > Yes, I did this and the Words with the Umlaute went through the > Stopfilter. The ones without Umlaute were correctly removed. > > On Thu, Nov 8, 2012 at 2:22 AM, Lance Norskog <goks...@gmail.com> wrote: > >> You can debug this with the 'Analysis' page in the Solr UI. You pick >> 'text_general' and then give words with umlauts in the text box for >> indexing and queries. >> >> Lance >> >> ----- Original Message ----- >> | From: "Daniel Brügge" <daniel.brue...@googlemail.com> >> | To: solr-user@lucene.apache.org >> | Sent: Wednesday, November 7, 2012 8:45:45 AM >> | Subject: SolrCloud, Zookeeper and Stopwords with Umlaute or other >> special characters >> | >> | Hi, >> | >> | i am running a SolrCloud cluster with the 4.0.0 version. I have a >> | stopwords >> | file >> | which is in the correct encoding. It contains german Umlaute like >> | e.g. 'ü'. >> | I am >> | also running a standalone Zookeeper which contains this stopwords >> | file. In >> | my schema >> | i am using the stopwords file in the standard way: >> | >> | > >> | > <fieldType name="text_general" class="solr.TextField" >> | > positionIncrementGap="100"> >> | > <analyzer type="index"> >> | > <tokenizer class="solr.StandardTokenizerFactory"/> >> | > <filter class="solr.StopFilterFactory" >> | > ignoreCase="true" >> | > words="my_stopwords.txt" >> | > enablePositionIncrements="true" /> >> | >> | >> | When I am indexing i recognized, that all stopwords without Umlaute >> | are >> | correctly removed, but the ones with >> | Umlaute still exist. >> | >> | Is this a problem with ZK or Solr? >> | >> | Thanks & regards >> | >> | Daniel >> | >> > >