st.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: KK [mailto:dioxide.softw...@gmail.com]
> > Sent: Thursday, May 21, 2009 7:01 PM
&g
Hi KK,
> right? and remove this conversion that I'm doing later ,
>
> byte [] utfEncodeByteArray = textOnly.getBytes();
> String utfString = new String(utfEncodeByteArray, Charset.forName("UTF-
> 8"));
>
> This will make sure I'm not depending on the platform encoding, right?
In principle, ye
user@lucene.apache.org
> Subject: Re: Posting unicode data to lucene not working during
> searching/retreival!
>
> I did all the changes but no improvement. the data is getting indexed
> properly, I think because I'm able to see the results through luke and
> luke
> has opti
I did all the changes but no improvement. the data is getting indexed
properly, I think because I'm able to see the results through luke and luke
has option for seeing the results in both utf-8 encoding and string default
encoding. I tried to use both but no difference. In both the cases I'm able
t
Thanks @Uwe.
#To answer your last mails query, textOnly is the output of the method
downloadPage(), complete text thing includeing all html tags etc...
#Instead of doing the encode/decode later, what i should do is when
downloading the page through buffered reader put the charset as utf-8 as you
me
I forgot:
> byte [] utfEncodeByteArray = textOnly.getBytes();
> String utfString = new String(utfEncodeByteArray, Charset.forName("UTF-
> 8"));
>
> here textonly is the text extracted from the downloaded page
What is textonly here? A String, if yes, why decode and then again encode
it? The impor
,
> charset)
> > and so on. When you then print stored fields you must do the same in the
> > other direction. So the general rule: Always specify the correct charset
> > when converting to/from strings to bytes.
> > For searching: It roughly also depends also on the Anal
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: KK [mailto:dioxide.softw...@gmail.com]
> > Sent: Thursday, May 21, 2009 3:25 PM
> > To: java-us
://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: KK [mailto:dioxide.softw...@gmail.com]
> Sent: Thursday, May 21, 2009 3:25 PM
> To: java-user@lucene.apache.org
> Subject: Posting unicode data to lucene not working during
> searching/retreival!
>
> Ho
How to post utf-8 unicoded data to lucene index. Do we have to specify
something special, any sort of flag saying that we're posting unicoded data?
I tried to post some utf-8 encoded data, during retrieval I'm not able to
see those data , there are just "?" marks in all those places. Earlier I was
10 matches
Mail list logo