Phonetic search and matching

Dirk Högemann Mon, 06 Feb 2012 02:45:10 -0800

Hi,

I have a question on phonetic search and matching in solr.
In our application all the content of an article is written to a full-text
search field, which provides stemming and a phonetic filter (cologne
phonetic for german).
This is the relevant part of the configuration for the index analyzer
(search is analogous):


        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="German2"
/>
        <filter class="solr.PhoneticFilterFactory"
encoder="ColognePhonetic" inject="true"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />

Unfortunately this results sometimes in strange, but also explainable,
matches.
For example:

Content field indexes the following String: Donnerstag von 13 bis 17 Uhr.

This results in a match, if we search for "puf"  as the result of the
phonetic filter for this is 13.
(As a consequence the 13 is then also highlighted)

Does anyone has an idea how to handle this in a reasonable way that a
search for "puf" does not match 13 in the content?

Thanks in advance!

Dirk

Phonetic search and matching

Reply via email to