Re: Optimizing search speed & performance for a 10G Index.

2006-12-08 Thread Grant Ingersoll
lucene^1.5 content:lucene^1.0)) ((+(+content:linux +content:lucene)) +(site:contentsite1 site:contentsite2 site:contentsite3 site:contentsite4 site:contentsite5 site:contentsite6 site:contentsite7)))^0.01)) +location:australia) +newsdate:[20061107 TO 20061208] +region:au) -jobsite:ba

Re: Optimizing search speed & performance for a 10G Index.

2006-12-08 Thread Erick Erickson
+content:lucene)) +(site:contentsite1 site:contentsite2 site:contentsite3 site:contentsite4 site:contentsite5 site:contentsite6 site:contentsite7)))^0.01)) +location:australia) +newsdate:[20061107 TO 20061208] +region:au) -jobsite:badsite1 -region:badregion1 -jobsite:badsite2 -jobsite:badsite3 -jobsi

Custom Filter implementations - necessary to check if doc is deleted ?

2006-12-08 Thread Øyvind Stegard
Hi, I have a question that is probably easy to answer for many of you. I'm using some custom Filters with Lucene, mostly imlemented by using TermEnum/TermDocs and checking some condition. Is it necessary to check the deleted-status of documents that the filter includes (and never actually include

RE: Reading Performance

2006-12-08 Thread Aigner, Thomas
I have tried the HitsCollector and the time has improved ~ 3/4 a second on the searching. I still get really bad times when two or more people ask for data at the same time. The problem doesn't seem to be in writing the files, it's in getting data from the index when two or more people ask for la

Re: Reading Performance

2006-12-08 Thread Andrew Hudson
I think I've seen this problem when you use Lucene's built in delete mechanism, IndexReader.deleteDocument I believe. The problem was it was synchronizing on a java BitSet, which totally killed performance when more than one process was using the same IndexReader. Better way to do deletes is to

Re: Custom Filter implementations - necessary to check if doc is deleted ?

2006-12-08 Thread Chris Hostetter
: Is it necessary to check the deleted-status of documents that the filter : includes (and never actually include deleted documents), or is this done ... : I'm trying to implementent an inverted version of a filter, simply by : flipping all the bits in the BitSet, after the filter has fini

Filter question

2006-12-08 Thread Van Nguyen
I have a query that uses a filter... looking something like this: BooleanQuery filterQuery = new BooleanQuery(); // add criteria QueryFilter qf = new QueryFilter(filterQuery); CachingWrapperFilter cwf = new CachingWrapperFilter(qf);

Re: Reading Performance

2006-12-08 Thread Erick Erickson
I'm stumped. It seems like it might be time to haul a profiler out. I'm particularly surprised because I put together a test system that fired a bunch of threads at a searcher and saw nothing like you're seeing even up to the 30 simultaneous requests running. Do you know whether you're I/O bound,

de-boosting fields

2006-12-08 Thread Scott Smith
I have a collection of documents for which I've always returned the results sorted on the date/time of the document (using a sort object in the search method on my Searcher). It works great. Suddenly, I have a requirement to return the documents in relevancy order. So, that's easy (I thought)

Re: de-boosting fields

2006-12-08 Thread Erick Erickson
I've certainly seen references to writing custom scorers, so it's possible. you might find valuable hints by searching the mail archive. I'll leave it to the more expert folks to suggest which is your best option. Although (and I'm talking beyond my competence here), it *may* work for you to asse

Re: Reading Performance

2006-12-08 Thread Chris Hostetter
: > on the searching. I still get really bad times when two or more people : > ask for data at the same time. The problem doesn't seem to be in : > writing the files, it's in getting data from the index when two or more : > people ask for large recordsets back (I can take all the I/O statements