Sorry, the message was not meant to be sent here. We are struggling with the same problem here.
2011/1/11 Matti Oinas <matti.oi...@gmail.com>: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers > > On wildcard and fuzzy searches, no text analysis is performed on the > search word. > > 2011/1/11 Kári Hreinsson <k...@gagnavarslan.is>: >> Hi, >> >> I am having a problem with the fact that no text analysis are performed on >> wildcard queries. I have the following field type (a bit simplified): >> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> >> <analyzer> >> <tokenizer class="solr.WhitespaceTokenizerFactory" /> >> <filter class="solr.TrimFilterFactory" /> >> <filter class="solr.LowerCaseFilterFactory" /> >> <filter class="solr.ASCIIFoldingFilterFactory" /> >> </analyzer> >> </fieldType> >> >> My problem has to do with Icelandic characters, when I index a document with >> a text field including the word "sjálfsögðu" it gets indexed as "sjalfsogdu" >> (because of the ASCIIFoldingFilterFactory which replaces the Icelandic >> characters with their English equivalents). Then, when I search (without a >> wildcard) for "sjálfsögðu" or "sjalfsogdu" I get that document as a result. >> This is convenient since it enables people to search without using accented >> characters and yet get the results they want (e.g. if they are working on >> computers with English keyboards). >> >> However this all falls apart when using wildcard searches, then the search >> string isn't passed through the filters, and even if I search for "sjálf*" I >> don't get any results because the index doesn't contain the original words >> (I get result if I search for "sjalf*"). I know people have been having a >> similar problem with the case sensitivity of wildcard queries and most often >> the solution seems to be to lowercase the string before passing it on to >> solr, which is not exactly an optimal solution (yet a simple one in that >> case). The Icelandic characters complicate things a bit and applying the >> same solution (doing the lowercasing and character mapping) in my >> application seems like unnecessary duplication of code already part of solr, >> not to mention complication of my application and possible maintenance down >> the road. >> >> Is there any way around this? How are people solving this? Is there a way >> to apply the filters to wildcard queries? I guess removing the >> ASCIIFoldingFilterFactory is the simplest "solution" but this >> "normalization" (of the text done by the filter) is often very useful. >> >> I hope I'm not overlooking some obvious explanation. :/ >> >> Thanks in advance, >> Kári Hreinsson >> >