Re: IndexSearcher memory leak?

2006-07-06 Thread Heng Mei
Thanks for the helpful tip, it makes sense now. I had previously assumed (wrongly) that RAMDirectory.close() would free up its memory buffers.. but i guess I needed to RTFC... RAMDirectory.close() is just an empty method. On 7/5/06, Rob Staveley (Tom) <[EMAIL PROTECTED]> wrote: My two bits...

regarding priority of displaying paths of indexed files

2006-07-06 Thread amit_kkumar
hi all, i am want to ask if files are indexed and on Query search in what order the paths of files are displayed. is it the highest no. of match occur in one file will be displayed first than others ? regards amit kumar DISCLAIMER == This e-mail may contain privileged and confidential

RE: IndexSearcher memory leak?

2006-07-06 Thread Rob Staveley (Tom)
> i guess I needed to RTFC I found that recently too. My only contribution to Lucene has been asking for a Javadoc addition to prevent others from falling into a trap, which I fell into. My issue was http://issues.apache.org/jira/browse/LUCENE-594. Similarly, you could ask for a Javadoc comment fo

Berkeley DB JEDirectory Performance

2006-07-06 Thread Johannes Christen
Hi all. I just want to share my experience with the Berkeley DB JEDirectory implementation from the contrib. area. I spend two days evaluating and testing it and found out that it does work, but has very bad performance and very high disk requirements for medium size document volume. I indexed

Re: BitSet in a HitCollector

2006-07-06 Thread James Pine
Hey, Sorry, I will explain a bit more about my collect method. Currently my collect method is executing IndexSearcher.doc(id) and storing some stuff in a Map which I can then retrieve from the HitCollector (much like the example in the Lucene In Action book). Of course that's somewhat expensive, s

Re: BitSet in a HitCollector

2006-07-06 Thread Tricia Williams
Hi James, A paper was mentioned on this list in the last couple of months which presents a solution to your sampling problem without having to know the total results size in advance. The paper (http://www2005.org/cdrom/docs/p245.pdf) presents two solutions which utilize a random variable.

Re: Berkeley DB JEDirectory Performance

2006-07-06 Thread Erick Erickson
Thanks, I always appreciate someone else doing work for me Best Erick

Managing a large archival (and constantly changing) database

2006-07-06 Thread Scott Smith
I've been asked to do a project which provides full-text search for a large database of articles. The expectation is that most of the articles are fairly small (<2k bytes). There will be an initial population of around 400,000 articles. There will then be approximately 2000 new articles added ea

RE: Managing a large archival (and constantly changing) database

2006-07-06 Thread Larry Ogrodnek
We have a similar setup, although probably only 1/5th the number of documents and updates. I'd suggest just making periodic index backups. I've been storing my index as follows: //data/ (lucene index directory) //backups/ The "data" is what's passed into IndexWriter/IndexReader. Additionally,

RE: Managing a large archival (and constantly changing) database

2006-07-06 Thread James Pine
Hey, I found this thread to be very useful when deciding upon an indexing strategy. http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12700.html The system I work on has 3 million or so documents and it was (until a non-lucene performance issue came up) setup to add/delete new docum

Finding docNum of a given indexed file

2006-07-06 Thread Maurice Yarrow
Hello Lucene community So, having looked at the api and at numerous email postings and exchanges, I see that updating a particular document in the index that represents a given file that has changed involves 1) deleting with deleteDocument (of either IndexReader or IndexModifier) and then 2

BooleanQuery question

2006-07-06 Thread Van Nguyen
I have a BooleanQuery that looks like this: BooleanQuery query = new BooleanQuery(); TermQuery term1 = new TermQuery(new Term(ID, "1234")); TermQuery term2 = new TermQuery(new Term(ID, "2344")); TermQuery term2 = new TermQuery(new Term(ID, "2323")); TermQuery termLocation = new TermQuery

Re: BooleanQuery question

2006-07-06 Thread Michael D. Curtin
Van Nguyen wrote: I just want results that have: ID: 1234 OR 2344 OR 2323 LOCATION: A1 LANGUAGE: ENU This query returns everything from my index. How would I create a query that will only return results the must have LOCATION and LANGUAGE and have only those three IDs. I think you'll ne

RE: Managing a large archival (and constantly changing) database

2006-07-06 Thread Chris Hostetter
: I found this thread to be very useful when deciding : upon an indexing strategy. : : http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12700.html FYI: that thread was the basis of the mechanism Solr uses to create "snapshots" of indexes for replication fro ma Master to multiple Slav

Re: Finding docNum of a given indexed file

2006-07-06 Thread Chris Hostetter
: Could the file name (fully qualified filepath/filename) be used as the : search : term ? : : Could the entire file be stringified (one long string, with or without : new-lines) : and that be used as the term (probably not, since not tokenized) ? either of those can work -- it all depends on how

Searching for similiar texts

2006-07-06 Thread Dominik Bruhn
Hy, I index articles using two fields, one for the title and one for the text. Now I want to display 5 similiar Articles for every Article during viewing. How can I manage this? Any premade solutions? Thanks -- Dominik Bruhn mailto: [EMAIL PROTECTED] http://www.dbruhn.de --

Re: Searching for similiar texts

2006-07-06 Thread Otis Gospodnetic
Look for MoreLikeThis class in Lucene's contrib/ directory. Otis - Original Message From: Dominik Bruhn <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, July 6, 2006 7:54:20 PM Subject: Searching for similiar texts Hy, I index articles using two fields, one for the ti

Lucene search formula

2006-07-06 Thread Rajiv Roopan
Hello, I was recently looking thru the lucene in action book and came across the scoring formula. I was wondering if the formula has changed since the book was written? Also was wondering if someone can breifly explain what the IDF(t) term in the formula means? In the book it says that it's th

RE: Function writing using lucene

2006-07-06 Thread Amit
Thanks Erick for reply.it will help us. Regards, Amit -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 05, 2006 6:32 PM To: java-user@lucene.apache.org; [EMAIL PROTECTED] Subject: Re: Function writing using lucene Amit: You can make a

Re: Lucene search formula

2006-07-06 Thread Chris Hostetter
:I was recently looking thru the lucene in action book and came across the : scoring formula. I was wondering if the formula has changed since the book : was written? no, but the book has some mistakes, and the scoring formula is one of them... http://lucenebook.com/blog/errata/ http://lucene

Re: Lucene search formula

2006-07-06 Thread Otis Gospodnetic
The formula hasn't changed (but the first printing of the book had a portion of it missing, check javadoc for (Default?)Similarity for the real and current formula). Here is a simple IDF example, or at least how I "visualize" IDF. You have an index with a bunch of documents and terms in it. A t