Thanks!
Yes you're right. Highlight all hits at one time may cause problems. A hits
paging method is needed to avoid this.
Another, if we read the contents of the original file into a string, passing
it to the highlighter at the searching stage, this also could cause problems
when large orig
I have some specific questions (Using a pure java solution with
PostgreSQL). Any comments/info would be much appreciated.
1) I build an index. Then as the db gets more "documents" inserted
into it, it needs to be added to the index. I was thinking of checking
for X documents (will store in a vecto
Hi,
I am using Memeoryindex as described here:
http://dsd.lbl.gov/nux/api/org/apache/lucene/index/memory/MemoryIndex.html .
I am using it to match lots(10 thousands) of queries with one document. Then
I want to rank them based on score and some other variables. I want to know if
there i
I did run down the issue. And it's a case of tired coder. I wasn't
creating a new document object in the method I was using to handle word
documents.
Thanks very much for the links guys, I appreciate it!
steve.
Chris Hostetter wrote:
: I dump the doc files into a text file with the same var
On 27 Nov 2005, at 00:24, Jerry Stern wrote:
I wonder how to highlight the searched word when full-text
searching performed based on Lucene.
At the indexing stage, the contents of a original file is
regarded as a FIELD of a Lucene document:
private static void indexFile(File file, Ind
: I dump the doc files into a text file with the same variable I use in
: the Lucene doc.add(Field.UnStored("content", textStr));| and they look
: fine in the file. However searches return nothing.
if i'm reading that sentence correctly, then you are saying that you've
tried isolating your MS-Wor
While writing a simple stress testing exercise, I came across the
strange condition that the IndexReader locks the index even though it's
only supposed to be reading.
Now, I understand that IndexReader can in fact modify the index (no
matter how unintuitive that is) but it seems to me that a l
Hello Steven,
There is a small ready-to-do framework in Lucene in Action that you can
use to indes MS Word, PDF, RTF, XML, and plain0text docs -
http://lucenebook.com/ . I suggest you stick with POI libraries, as it
looks like Textmining code is no longer maintained.
Otis
--- Steven Bell <[EMAI