Excellent, caching filters seem to fit the bill best so will use those
with the flags stored in the underlying index in the format you
suggested. Thank you for the assistance.
Larry
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Friday, November 10, 2006 12:27 PM
You did not specify what's wrong - in what way is the code below not
working as you expect?
Two things to check:
(1) search() and refindSearchResult() process the text of the first query
differently. In search() the text is added to multiple fields
("metaField"). The way it is done btw would not
: Nevertheless, all values should be available during the calculation of the
overall
: score, which is done inside the Similarity class. Thus, collecting of these
should
: result into nearly no runtime overhead, its mainly a question about memory.
Similarity instances don't calculate any scores
Erick Erickson wrote:
Something like
Document doc = new Document();
doc.add("flag1", "Y");
doc.add("flag2", "Y");
IndexWriter.add(doc);
Fields have overheads. It would be more efficient to implement this as
a single field with a different value for each boolean flag (as others
have suggested
There have been a couple of alternative Highlighter contributions recently, I
can't recall which claim to support "proper" highlighting of phrases but you
might want to give them a try.
http://issues.apache.org/jira/browse/LUCENE-644
http://issues.apache.org/jira/browse/LUCENE-663
Ultimately
Hi there,
I have a question on using the Highlighter.
I'm using Lucene in a web application that allows you to search the
catalogue of a library. The idea is to highlight, in the results, the
terms entered by the user. I'm using a Highlighter with a NullFragmenter
because I want the whole fiel
You may want to use something like pdftotext part of XPDF
(http://www.foolabs.com/xpdf/download.html). It will produce a text
extract for a PDF. Indexing will work like a breeze, without memory
consumption of PDFBox.
Regards,
Ioan
spinergywmy wrote:
Hi,
I having this indexing the pdf file
Have you measured to see how much of your time is spent indexing and how
much is just parsing the file? You need to do this before having a clue what
you need to make faster
Erick
On 11/10/06, Daniel Naber <[EMAIL PROTECTED]> wrote:
On Friday 10 November 2006 12:18, spinergywmy wrote:
> I
On Friday 10 November 2006 12:18, spinergywmy wrote:
> I having this indexing the pdf file performance issue. It took me more
> than 10 sec to index a pdf file about 200kb. Is it because I only have a
> segment file? How can I make the indexing performance better?
PDFBox (which I assume you are
Larry Taylor wrote:
What we need to do is to be able to store a bit mask specifying various
filter flags for a document in the index and then search this field by
specifying another bit mask with desired filters, returning documents
that have any of the specified flags set. In other words, we are
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hello folks,
we want to work with explanations of document scores inside result lists.
In this context we are interested on the scores of the single terms from a
query, for each document inside the result list:
Query:
"termA termB"
Result:
doc1 =>
Hi,
I having this indexing the pdf file performance issue. It took me more
than 10 sec to index a pdf file about 200kb. Is it because I only have a
segment file? How can I make the indexing performance better?
Thanks
regards,
Wooi Meng
--
View this message in context:
http://www.nabble
Hi Doron,
I m not sure I m implement your suggestion correctly.
The way I did is I have 2 separate methods controlling by the check box.
I used basic search method for the first time and that will look up the
index from the directory. After I got the result, I will check the checkbox
and t
13 matches
Mail list logo