RE: WordDelimiterFilter splits at non-ASCII chars

Stefan Oestreicher Wed, 16 Jul 2008 01:34:30 -0700

Yes you're right. I was testing with analysis.jsp but it chokes on multibyte
chars.
I modified the jsp and set the encoding using
request.setCharacterEncoding("UTF-8");
and it's working fine. Bug in analysis.jsp?


thanks,
 
Stefan Oestreicher 

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
> Of Yonik Seeley
> Sent: Tuesday, July 15, 2008 6:29 PM
> To: solr-user@lucene.apache.org
> Subject: Re: WordDelimiterFilter splits at non-ASCII chars
> 
> On Tue, Jul 15, 2008 at 10:29 AM, Stefan Oestreicher 
> <[EMAIL PROTECTED]> wrote:
> > as I understand the WordDelimiterFilter should split on 
> case changes, 
> > word delimiters and changes from character to digit, but it 
> should not 
> > differentiate between ASCII and multibyte chars. It does 
> however. The 
> > word "hälse" (german plural of "neck") gets split into "h", "ä" and 
> > "lse", which unfortunately renders this filter quite 
> unusable for me. 
> > Am i missing something or is this a bug?
> > I'm using solr 1.3 built from trunk.
> 
> Look for charset issues in communicating with Solr.  I just 
> tried this with the "text" field via Solr's analysis.jsp and 
> it works fine.
> 
> -Yonik
>

RE: WordDelimiterFilter splits at non-ASCII chars

Reply via email to