RE: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Uwe Schindler
The finalize() thing does not work correctly, as the reader holds still references to other stuff when not explicitely closed. As it references them, the finalizer() is never called, as it is not to be gc'd. You must close the reader explicit, that's all. So just close it afterusing. With Near Rea

Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Jamie
Uwe If I recall correctly when you call writer.getReader(), the returned IndexReader can consume alot of memory with large indexes. To ensure that the same index reader is reused across multiple search threads, I keep a cached copy of the reader and return it. If a search thread closes the r

Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Michael McCandless
Opening an NRT reader per-search can be too costly if you have a high search rate. It's better to rate-limit for that case, eg to at most 10X per second (every 100 msec) reopens. There's a useful class in the Lucene in Action 2 source code (NOTE: I am a co-author), SearcherManager, which simplifi

Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Michael McCandless
Comments inline... On Thu, Sep 30, 2010 at 5:26 AM, Jamie wrote: >  Uwe > > If I recall correctly when you call writer.getReader(), the returned > IndexReader can consume alot of memory with large indexes The reopened reader shares sub-readers with the previous one, so, if all that's changed sin

Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Jamie
Hi Michael / Uwe >It's good to cache the reader, but, finalize would worry me too since >you have no control over when GC gets around to calling it... you risk >tying up resources for longer than necessary. I did it this way, as I didn't want to over complicate the code by introducing mechanis

RE: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Uwe Schindler
Hi Jamie, > >It's good to cache the reader, but, finalize would worry me too since >you > have no control over when GC gets around to calling it... you risk >tying up > resources for longer than necessary. > > I did it this way, as I didn't want to over complicate the code by introducing > mecha

Merge policy, optimization for small frequently changing indexes.

2010-09-30 Thread Naveen Kumar
Hi I have a Very large number (say 3 million) of frequently changing Small indexes. 90% of these indexes contain about 50 documents, while a few 2-3% indexes have about 100,000 documents each (these being the more frequently used indexes). Each index belongs to a signed in user, thus can have unpre

How Does Fuzzy Query Work ??

2010-09-30 Thread ahmed algohary
Hi all, I wonder how lucene FuzzyQuery works as it seems to take much longer time than a normal query. Does it generate all the possible terms and search for them ?? -- Ahmed Elgohary

how to get the first term from index?

2010-09-30 Thread Sahin Buyrukbilen
Hi all, I need to get the first term in my index and iterate it. Can anybody help me? Best.

Re: how to get the first term from index?

2010-09-30 Thread Anshum
Hi Sahin, Incase you intend to get an enumerator on the terms in an index, you could use the following call [indexreader.terms()] from IndexReader to get the enumerator on terms and just iterate. http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/index/IndexReader.html#terms() Hope thi

Re: how to get the first term from index?

2010-09-30 Thread Sahin Buyrukbilen
Thank you Anshum, it seems to be working, I need to play with it. On Thu, Sep 30, 2010 at 2:34 PM, Anshum wrote: > Hi Sahin, > Incase you intend to get an enumerator on the terms in an index, you could > use the following call [indexreader.terms()] from IndexReader to get the > enumerator on t

Re: How Does Fuzzy Query Work ??

2010-09-30 Thread Robert Muir
On Thu, Sep 30, 2010 at 8:41 AM, ahmed algohary wrote: > Hi all, > > I wonder how lucene FuzzyQuery works as it seems to take much longer time > than a normal query. Does it generate all the possible terms and search for > them ?? > > In current versions of lucene it is documented to be slow: "War

RE: Problem searching in the same sentence

2010-09-30 Thread Sirish Vadala
I have tried the below code: Field field = new Field(fieldName, validFieldValue, (store) ? Field.Store.YES : Field.Store.NO, (tokenize) ? Field.Index.ANALYZED : Field.Index.NOT_ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS); However, I still have the same problem. It

Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Michael McCandless
You can also use the IndexReader's incRef/decRef methods. Mike On Thu, Sep 30, 2010 at 6:12 AM, Uwe Schindler wrote: > Hi Jamie, >>  >It's good to cache the reader, but, finalize would worry me too since >>you >> have no control over when GC gets around to calling it... you risk  >tying > up >>

Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Michael McCandless
On Thu, Sep 30, 2010 at 5:59 AM, Jamie wrote: >  Hi Michael / Uwe > >>It's good to cache the reader, but, finalize would worry me too since >>you have no control over when GC gets around to calling it... you risk >>tying up resources for longer than necessary. > > I did it this way, as I didn't wa

Looking for advice on using Lucene to semantically compare two documents

2010-09-30 Thread Jonathan Ciampi
Advice on comparing two documents. Summary This project is not a search engine but a semantic comparison between two documents. The purpose of this application is to assist users in modifying the text in a document to improve the relevancy rank of the document to another document. For exampl

Re: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Jamie
Hi Mike I managed to get hold of a copy of your book through Safari Books. Quite an impressive online reading system they have there! I integrated your SearchManager class into our code, but I am still seeing file handles marked deleted in the index directory. I am running the following com

RE: File Handle Leaks During Lucene 3.0.2 Merge

2010-09-30 Thread Uwe Schindler
Hi Jamie, YES, ist expected for the reasons described above (segments are still referenced by the open IndexReaders, but files were already deleted by IndexWriter). The approx. number of open, but already deleted files should be approx. stable. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-282