Re: Lucene standard analyzer internationalization

2008-04-22 Thread Chris Hostetter
: Yes the version of lucene and java are exactly the same on the different : machines. : Infact we unjared lucene and jared it with our jar and are running from the : same nfs mounts on both the machines i didn't do an indepth code read, but a quick skim of StandardTokenizerImpl didn't turn up a

RE: Lucene standard analyzer internationalization

2008-04-22 Thread Steven A Rowe
Hi Prashant, What is the Unicode code point associated with the 3,4,5 character? Steve On 04/22/2008 at 4:45 PM, Prashant Malik wrote: > Yes the version of lucene and java are exactly the same on > the different > machines. > Infact we unjared lucene and jared it with our jar and are > running f

Re: Lucene standard analyzer internationalization

2008-04-22 Thread Prashant Malik
Yes the version of lucene and java are exactly the same on the different machines. Infact we unjared lucene and jared it with our jar and are running from the same nfs mounts on both the machines Also we have tried with lucene2.2.0 and 2.3.1. with the same result . also about the actual string u

RE: Lucene standard analyzer internationalization

2008-04-22 Thread Steven A Rowe
Hi Prashant, On 04/22/2008 at 2:23 PM, Prashant Malik wrote: > We have been observing the following problem while > tokenizing using lucene's StandardAnalyzer. Tokens that we get is > different on different machines. I am suspecting it has something to do > with the Locale settings on individu

Lucene standard analyzer internationalization

2008-04-22 Thread Prashant Malik
HI , We have been observing the following problem while tokenizing using lucene's StandardAnalyzer. Tokens that we get is different on different machines. I am suspecting it has something to do with the Locale settings on individual machines? For example the word 'CÃ(c)sar' is split as 'CÃ