HI ,
We have been observing the following problem while tokenizing using
lucene's StandardAnalyzer. Tokens that we get is different on different
machines. I am suspecting it has something to do with the Locale settings on
individual machines?
For example
the word 'CÃ(c)sar' is split as
have it right till 2 .
3,4,5 are a single character
Thx
PM
On Tue, Apr 22, 2008 at 12:01 PM, Steven A Rowe [EMAIL PROTECTED] wrote:
Hi Prashant,
On 04/22/2008 at 2:23 PM, Prashant Malik wrote:
We have been observing the following problem while
tokenizing using lucene's