Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
st. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: KK [mailto:dioxide.softw...@gmail.com] > > Sent: Thursday, May 21, 2009 7:01 PM &g

RE: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread Uwe Schindler
Hi KK, > right? and remove this conversion that I'm doing later , > > byte [] utfEncodeByteArray = textOnly.getBytes(); > String utfString = new String(utfEncodeByteArray, Charset.forName("UTF- > 8")); > > This will make sure I'm not depending on the platform encoding, right? In principle, ye

RE: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread Uwe Schindler
user@lucene.apache.org > Subject: Re: Posting unicode data to lucene not working during > searching/retreival! > > I did all the changes but no improvement. the data is getting indexed > properly, I think because I'm able to see the results through luke and > luke > has opti

Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
I did all the changes but no improvement. the data is getting indexed properly, I think because I'm able to see the results through luke and luke has option for seeing the results in both utf-8 encoding and string default encoding. I tried to use both but no difference. In both the cases I'm able t

Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
Thanks @Uwe. #To answer your last mails query, textOnly is the output of the method downloadPage(), complete text thing includeing all html tags etc... #Instead of doing the encode/decode later, what i should do is when downloading the page through buffered reader put the charset as utf-8 as you me

RE: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread Uwe Schindler
I forgot: > byte [] utfEncodeByteArray = textOnly.getBytes(); > String utfString = new String(utfEncodeByteArray, Charset.forName("UTF- > 8")); > > here textonly is the text extracted from the downloaded page What is textonly here? A String, if yes, why decode and then again encode it? The impor

RE: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread Uwe Schindler
, > charset) > > and so on. When you then print stored fields you must do the same in the > > other direction. So the general rule: Always specify the correct charset > > when converting to/from strings to bytes. > > For searching: It roughly also depends also on the Anal

Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
> > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: KK [mailto:dioxide.softw...@gmail.com] > > Sent: Thursday, May 21, 2009 3:25 PM > > To: java-us

RE: Posting unicode data to lucene not working during searching/retreival!

2009-05-20 Thread Uwe Schindler
://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: KK [mailto:dioxide.softw...@gmail.com] > Sent: Thursday, May 21, 2009 3:25 PM > To: java-user@lucene.apache.org > Subject: Posting unicode data to lucene not working during > searching/retreival! > > Ho

Posting unicode data to lucene not working during searching/retreival!

2009-05-20 Thread KK
How to post utf-8 unicoded data to lucene index. Do we have to specify something special, any sort of flag saying that we're posting unicoded data? I tried to post some utf-8 encoded data, during retrieval I'm not able to see those data , there are just "?" marks in all those places. Earlier I was