Mark,
thanks a lot. Based on my first tests it seems that I will be able to finish
my initial goal.
I will be doing something like the following:
for (int i = 0; i hits.length(); i++) {
String[] texts = hits.doc(i).getValues(lotid);
for (String
Its questionable if you are losing performance. Unless you have really
large docs or a nasty slow analyzer, I have found it is usually faster
or as fast to reanalyze as it is to use TermVectors, which can be quite
time consuming to load up and assemble a TokenStream from. You might run
some
Hey Lukas,
I was being simplistic when I said that the text and TokenSteam must be
exactly the same. It's difficult to think of a reason why you would not
want them to be the same though. Each Token records the offsets where it
can be found in the original text -- that is how the Highlighter
Hi Lucene experts,
The following is a simple Lucene code which generates
StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0 official
releasse. Can anyone tell me what is wrong with this code? Is this a bug or
a feature of Lucene? Any comments/hits highly welcommed!
In a nutshell
Hey Lukas,
Sorry I havn't gotten back to you on this sooner. Been meaning too, but
I have been busy. Still am a little, but here is some to get you started:
The token stream you send to the highlighter must match the text you
send to the highlighter.
Your token stream is this:
I'm am going to try and write up some more info for you tomorrow, but
just to point out: I do think there is a bug in the way offsets are
being handled. I don't think this is causing your current problem (what
I mentioned is) but it will prob cause you problems down the road. I
will look into
Mark,
thank you for this. I will wait for your other responses.
This will keep me going on :-)
I didn't know that there is a design restriction in Lucene that the text and
TokenStream must be exactly the same (still this seems redundant, I will
dive into Lucene API more).
BR
Lukas
On 7/29/07,