ucene takes care of the rest.
thanks everybody for suggestions
dario
>From: "redpineseed" <[EMAIL PROTECTED]>
>Reply-To: "Lucene Users List" <[EMAIL PROTECTED]>
>To: "Lucene Users List" <[EMAIL PROTECTED]>
>Subject: Re: setting encodin
> The biggest problem is some cp1252 characters are "private" in the unicode
> byte set.
those chararcters may not be in the unicode byte (char) set at all and that is the
major trouble with processing chinese,
convert your native code to unicode (UTF16) with the following lines:
File
I don't know how have Lucene store in cp1252 (Windows latin-1), but I don't
think you have to.
I'm pretty sure it will take what ever information you have in a Java String
and save it as unicode. Then recreate it into a Java String.
So the issue I think you have is converting from cp1252 into a J