Re: background merge hit exception

2008-07-21 Thread sandyg
Hi, thx for the reply. But i had enough space in my desk . Michael McCandless-2 wrote: > > > Normally when optimize throws this exception, either it includes a > "caused by" in the exception, or, you should have seen a stack trace > from a merge thread printed to your stderr. > > One quic

Re: background merge hit exception

2008-07-21 Thread Michael McCandless
Did you see a "caused by" in your stack trace? Or, a separate unhandled exception in a thread? Is this reproducible? Also, did you have a reader open on the index when you ran the optimize? If so, you'd need 2X the index size (eg 20 GB with the example below) free. Mike sandyg wrote

Re: GeoSort approach - your opinion

2008-07-21 Thread Toke Eskildsen
On Sat, 2008-07-19 at 11:53 +0200, Sascha Fahl wrote: > last week I realized an approach for GeoSort in lucene. Inspired by > "Lucene in action" I modified the algorithm in the following way. When > an IndexReader for a certain index is created, a cache for > geoinformation is created - this

Re: How to avoid duplicate records in lucene

2008-07-21 Thread Sebastin
at the time search , while querying the data markrmiller wrote: > > Sebastin wrote: >> Hi All, >> >> Is there any possibility to avoid duplicate records in lucene 2.3.1? >> > I don't believe that there is a very high performance way to do this. > You are basically going to have to query the

Re: How to avoid duplicate records in lucene

2008-07-21 Thread Erick Erickson
could you define duplicate? As far as I know, you don't get the same (internal) doc id back more than once, so what is a duplicate? Best Erick On Mon, Jul 21, 2008 at 9:40 AM, Sebastin <[EMAIL PROTECTED]> wrote: > > at the time search , while querying the data > markrmiller wrote: > > > > Sebast

Re: How to avoid duplicate records in lucene

2008-07-21 Thread markharw00d
>>could you define duplicate? That's your choice of field that you want to de-dup on. That could be a field such as "DatabasePrimaryKey" or perhaps a field containing an MD5 hash of document content. The DuplicateFilter ensures only one document can exist in results for each unique value for th

Re: How to avoid duplicate records in lucene

2008-07-21 Thread eks dev
you could maintain your bloom filter and check only "positives" if they are not false positives with exact search, if you have small percentage of duplicates (unique documents dominate updates) this will help you a lot on performance side - Original Message > From: markharw00d <[EMA

RE: Boolean expression for no terms OR matching a wildcard

2008-07-21 Thread Steven A Rowe
Hi Ronald, Caveat - I haven't tested this, but: With a RegexQuery , I think you can do something like (using your example): +abc*123 -{Regex}(?!abc.*123$) This query would include all documents that hav

Re: Boolean expression for no terms OR matching a wildcard

2008-07-21 Thread Ronald Rudy
Thanks Steve, this looks promising even if it doesn't perform the best. I'll run some tests on what produces the best results. -Ron On Jul 21, 2008, at 3:00 PM, Steven A Rowe wrote: Hi Ronald, Caveat - I haven't tested this, but: With a RegexQuery

Storing information

2008-07-21 Thread blazingwolf7
Hi, I am using Lucene to perform searching. I have certain information that will be loaded everytime a search is run. This means, if there are multiple user running the search at the same time, the information will be loaded multiple times. This is not effecient at all, so I was wondering is t

Re: Storing information

2008-07-21 Thread Yonik Seeley
On Mon, Jul 21, 2008 at 11:27 PM, blazingwolf7 <[EMAIL PROTECTED]> wrote: > I am using Lucene to perform searching. I have certain information that will > be loaded everytime a search is run. This means, if there are multiple user > running the search at the same time, the information will be loade