RE: Searching by bit masks

2006-11-10 Thread Larry Taylor
Excellent, caching filters seem to fit the bill best so will use those with the flags stored in the underlying index in the format you suggested. Thank you for the assistance. Larry -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Friday, November 10, 2006 12:27 PM

Re: Filter query method

2006-11-10 Thread Doron Cohen
You did not specify what's wrong - in what way is the code below not working as you expect? Two things to check: (1) search() and refindSearchResult() process the text of the first query differently. In search() the text is added to multiple fields ("metaField"). The way it is done btw would not

Re: result explanations / how to get the current document id inside a similarity subclass

2006-11-10 Thread Chris Hostetter
: Nevertheless, all values should be available during the calculation of the overall : score, which is done inside the Similarity class. Thus, collecting of these should : result into nearly no runtime overhead, its mainly a question about memory. Similarity instances don't calculate any scores

Re: Searching by bit masks

2006-11-10 Thread Doug Cutting
Erick Erickson wrote: Something like Document doc = new Document(); doc.add("flag1", "Y"); doc.add("flag2", "Y"); IndexWriter.add(doc); Fields have overheads. It would be more efficient to implement this as a single field with a different value for each boolean flag (as others have suggested

Re: Highlighting span for Phrase Queries

2006-11-10 Thread mark harwood
There have been a couple of alternative Highlighter contributions recently, I can't recall which claim to support "proper" highlighting of phrases but you might want to give them a try. http://issues.apache.org/jira/browse/LUCENE-644 http://issues.apache.org/jira/browse/LUCENE-663 Ultimately

Highlighting span for Phrase Queries

2006-11-10 Thread Heikki Doeleman
Hi there, I have a question on using the Highlighter. I'm using Lucene in a web application that allows you to search the catalogue of a library. The idea is to highlight, in the results, the terms entered by the user. I'm using a Highlighter with a NullFragmenter because I want the whole fiel

Re: Indexing Performance issue

2006-11-10 Thread Ioan Cocan
You may want to use something like pdftotext part of XPDF (http://www.foolabs.com/xpdf/download.html). It will produce a text extract for a PDF. Indexing will work like a breeze, without memory consumption of PDFBox. Regards, Ioan spinergywmy wrote: Hi, I having this indexing the pdf file

Re: Indexing Performance issue

2006-11-10 Thread Erick Erickson
Have you measured to see how much of your time is spent indexing and how much is just parsing the file? You need to do this before having a clue what you need to make faster Erick On 11/10/06, Daniel Naber <[EMAIL PROTECTED]> wrote: On Friday 10 November 2006 12:18, spinergywmy wrote: > I

Re: Indexing Performance issue

2006-11-10 Thread Daniel Naber
On Friday 10 November 2006 12:18, spinergywmy wrote: >  I having this indexing the pdf file performance issue. It took me more > than 10 sec to index a pdf file about 200kb. Is it because I only have a > segment file? How can I make the indexing performance better? PDFBox (which I assume you are

Re: Searching by bit masks

2006-11-10 Thread John Haxby
Larry Taylor wrote: What we need to do is to be able to store a bit mask specifying various filter flags for a document in the index and then search this field by specifying another bit mask with desired filters, returning documents that have any of the specified flags set. In other words, we are

result explanations / how to get the current document id inside a similarity subclass

2006-11-10 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello folks, we want to work with explanations of document scores inside result lists. In this context we are interested on the scores of the single terms from a query, for each document inside the result list: Query: "termA termB" Result: doc1 =>

Indexing Performance issue

2006-11-10 Thread spinergywmy
Hi, I having this indexing the pdf file performance issue. It took me more than 10 sec to index a pdf file about 200kb. Is it because I only have a segment file? How can I make the indexing performance better? Thanks regards, Wooi Meng -- View this message in context: http://www.nabble

Re: Filter query method

2006-11-10 Thread spinergywmy
Hi Doron, I m not sure I m implement your suggestion correctly. The way I did is I have 2 separate methods controlling by the check box. I used basic search method for the first time and that will look up the index from the directory. After I got the result, I will check the checkbox and t