Hi, I have a question on phonetic search and matching in solr. In our application all the content of an article is written to a full-text search field, which provides stemming and a phonetic filter (cologne phonetic for german). This is the relevant part of the configuration for the index analyzer (search is analogous):
<tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="German2" /> <filter class="solr.PhoneticFilterFactory" encoder="ColognePhonetic" inject="true"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> Unfortunately this results sometimes in strange, but also explainable, matches. For example: Content field indexes the following String: Donnerstag von 13 bis 17 Uhr. This results in a match, if we search for "puf" as the result of the phonetic filter for this is 13. (As a consequence the 13 is then also highlighted) Does anyone has an idea how to handle this in a reasonable way that a search for "puf" does not match 13 in the content? Thanks in advance! Dirk