Hi!
I have a text field type "countystring" which I need for faceting.
This single-valued field should contain names of German counties like
"Südliche Weinstraße". No tokenizing, stemming etc. is intended. Only
one SynonymFilterFactory is applied.
<fieldType name="countystring" class="solr.TextField">
<analyzer>
<filter class="solr.SynonymFilterFactory"
synonyms="county-corrections.txt" ignoreCase="false" expand="false"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
<field name="county" type="countystring" indexed="true"
stored="true" required="false" />
In "county-corrections.txt" (which is UTF-8-encoded) I have mappings
as the following. And some of them work, others don't:
# these are applied as expected:
Vogelbergkreis => Vogelsbergkreis
Weissenburg-Gunzenhausen => Weißenburg-Gunzenhausen
# these aren't applied:
Südliche Weinstrasse => Südliche Weinstraße
"Südliche Weinstrasse" => Südliche Weinstraße
Stadtkreis Amberg => Amberg
"Stadtkreis Amberg" => Amberg
Köthen => Anhalt-Bitterfeld
K\u00F6then => Anhalt-Bitterfeld
It seems as if only those mappings without whitespaces and without
non-ASCII-characters are accepted. As you can see, I have tried our
various thinks like quoting and encoding non-ACII-Characters in
hexadecimal notation. None of them seems to work.
Is there a solution?
Thanks!
Marian