Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-08-05 Thread Lukas Vlcek
Mark, thanks a lot. Based on my first tests it seems that I will be able to finish my initial goal. I will be doing something like the following: for (int i = 0; i hits.length(); i++) { String[] texts = hits.doc(i).getValues(lotid); for (String

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-08-05 Thread Mark Miller
Its questionable if you are losing performance. Unless you have really large docs or a nasty slow analyzer, I have found it is usually faster or as fast to reanalyze as it is to use TermVectors, which can be quite time consuming to load up and assemble a TokenStream from. You might run some

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-30 Thread Mark Miller
Hey Lukas, I was being simplistic when I said that the text and TokenSteam must be exactly the same. It's difficult to think of a reason why you would not want them to be the same though. Each Token records the offsets where it can be found in the original text -- that is how the Highlighter

Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Lukas Vlcek
Hi Lucene experts, The following is a simple Lucene code which generates StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0 official releasse. Can anyone tell me what is wrong with this code? Is this a bug or a feature of Lucene? Any comments/hits highly welcommed! In a nutshell

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Mark Miller
Hey Lukas, Sorry I havn't gotten back to you on this sooner. Been meaning too, but I have been busy. Still am a little, but here is some to get you started: The token stream you send to the highlighter must match the text you send to the highlighter. Your token stream is this:

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Mark Miller
I'm am going to try and write up some more info for you tomorrow, but just to point out: I do think there is a bug in the way offsets are being handled. I don't think this is causing your current problem (what I mentioned is) but it will prob cause you problems down the road. I will look into

Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).

2007-07-28 Thread Lukas Vlcek
Mark, thank you for this. I will wait for your other responses. This will keep me going on :-) I didn't know that there is a design restriction in Lucene that the text and TokenStream must be exactly the same (still this seems redundant, I will dive into Lucene API more). BR Lukas On 7/29/07,