Hi!

I have a text field type "countystring" which I need for faceting.
This single-valued field should contain names of German counties like
"Südliche Weinstraße". No tokenizing, stemming etc. is intended. Only
one SynonymFilterFactory is applied.

    <fieldType name="countystring" class="solr.TextField">
        <analyzer>
                <filter class="solr.SynonymFilterFactory"
synonyms="county-corrections.txt" ignoreCase="false" expand="false"/>
                <tokenizer class="solr.KeywordTokenizerFactory"/>
        </analyzer>
    </fieldType>
   <field name="county" type="countystring" indexed="true"
stored="true" required="false" />


In "county-corrections.txt" (which is UTF-8-encoded) I have mappings
as the following. And some of them work, others don't:

# these are applied as expected:
Vogelbergkreis => Vogelsbergkreis
Weissenburg-Gunzenhausen => Weißenburg-Gunzenhausen

# these aren't applied:
Südliche Weinstrasse => Südliche Weinstraße
"Südliche Weinstrasse" => Südliche Weinstraße
Stadtkreis Amberg => Amberg
"Stadtkreis Amberg" => Amberg
Köthen => Anhalt-Bitterfeld
K\u00F6then => Anhalt-Bitterfeld


It seems as if only those mappings without whitespaces and without
non-ASCII-characters are accepted. As you can see, I have tried our
various thinks like quoting and encoding non-ACII-Characters in
hexadecimal notation. None of them seems to work.

Is there a solution?

Thanks!

Marian

Reply via email to