In Germany we have a strange habbit of seeing some sort of equivalence between Umlaut letters and a two letter representation. Example 'ä' and 'ae' are expected to give the same search results. To achieve this I added this filter to the "text" fieldtype definition: <filter class="solr.PatternReplaceFilterFactory" pattern="ä" replacement="ae" replace="all" /> to both index and query analyzers (and more for the other umlauts).
This works well when I search for a name (a word not stemmed) but not e.g. with the word "Wärme". search for 'wärme' works search for 'waerme' does not work search for 'waerm' works if I move the EnglishPorterFilterFactory after the PatternReplaceFilterFactory. DebugQuery for "waerme" gives a parsedquery FS:waerm. What I don't understand is why the (existing) records are not found. If I understand it right, there should be 'waerm' in the index as well. By the way, the reason why I keep the EnglishPorterFilterFactory is that the records are in many languages and the English stemming gives good results in many cases and I don't want (yet) to multiply my fields to have language specific versions. But even if the stemming is not right because the language is not English I think records should be found as long as the analyzers are the same for index and query. This is with Solr 1.3. Can someone shed some light on what is going on and how I can achieve my goal? -Michael