Yes, you could do that. I guess numbers will give you trouble
under all circumstances.

You may be able to do something like search against your non-
phonetic field with higher boosts to preferentially do those
matches.

Best
Erick

On Tue, Feb 7, 2012 at 2:30 PM, Dirk Högemann
<dirk.hoegem...@googlemail.com> wrote:
> Thanks Erick.
> In the first place we thought of removing numbers with a pattern filter.
> Setting inject to false will have the "same" effect
> If we want to be able to search for numbers in the content this solution
> will not work,but another field without phonetic filtering and searching in
> both fields would be ok,right?
>
> Dirk
> Am 07.02.2012 14:01 schrieb "Erick Erickson" <erickerick...@gmail.com>:
>
>> What happens if you do NOT inject? Setting  inject="false"
>> stores only the phonetic reduction, not the original text. In that
>> case your false match on "13" would go away....
>>
>> Not sure what that means for the rest of your app though.
>>
>> Best
>> Erick
>>
>> On Mon, Feb 6, 2012 at 5:44 AM, Dirk Högemann
>> <dirk.hoegem...@googlemail.com> wrote:
>> > Hi,
>> >
>> > I have a question on phonetic search and matching in solr.
>> > In our application all the content of an article is written to a
>> full-text
>> > search field, which provides stemming and a phonetic filter (cologne
>> > phonetic for german).
>> > This is the relevant part of the configuration for the index analyzer
>> > (search is analogous):
>> >
>> >        <tokenizer class="solr.StandardTokenizerFactory"/>
>> >        <filter class="solr.WordDelimiterFilterFactory"
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>> >        <filter class="solr.LowerCaseFilterFactory"/>
>> >        <filter class="solr.SnowballPorterFilterFactory"
>> language="German2"
>> > />
>> >        <filter class="solr.PhoneticFilterFactory"
>> > encoder="ColognePhonetic" inject="true"/>
>> >        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>> >
>> > Unfortunately this results sometimes in strange, but also explainable,
>> > matches.
>> > For example:
>> >
>> > Content field indexes the following String: Donnerstag von 13 bis 17 Uhr.
>> >
>> > This results in a match, if we search for "puf"  as the result of the
>> > phonetic filter for this is 13.
>> > (As a consequence the 13 is then also highlighted)
>> >
>> > Does anyone has an idea how to handle this in a reasonable way that a
>> > search for "puf" does not match 13 in the content?
>> >
>> > Thanks in advance!
>> >
>> > Dirk
>>

Reply via email to