Lucene standard analyzer internationalization

2008-04-22 Thread Prashant Malik
HI , We have been observing the following problem while tokenizing using lucene's StandardAnalyzer. Tokens that we get is different on different machines. I am suspecting it has something to do with the Locale settings on individual machines? For example the word 'CÃ(c)sar' is split as

Re: Lucene standard analyzer internationalization

2008-04-22 Thread Prashant Malik
have it right till 2 . 3,4,5 are a single character Thx PM On Tue, Apr 22, 2008 at 12:01 PM, Steven A Rowe [EMAIL PROTECTED] wrote: Hi Prashant, On 04/22/2008 at 2:23 PM, Prashant Malik wrote: We have been observing the following problem while tokenizing using lucene's