WordDelimiterFilter splits at non-ASCII chars

Stefan Oestreicher Tue, 15 Jul 2008 07:31:09 -0700

Hi,

as I understand the WordDelimiterFilter should split on case changes, word
delimiters and changes from character to digit, but it should not
differentiate between ASCII and multibyte chars. It does however. The word
"hälse" (german plural of "neck") gets split into "h", "ä" and "lse", which
unfortunately renders this filter quite unusable for me. Am i missing
something or is this a bug?
I'm using solr 1.3 built from trunk.


TIA,
 
Stefan Oestreicher

WordDelimiterFilter splits at non-ASCII chars

Reply via email to