Re: Solr 1.3 query and index perf tank during optimize

2009-11-22 Thread Lance Norskog
Oops, you're right, term listings and counts for deleted docs are adjusted during merges. I had the impression that optimize had some special powers here that merge does not. Thank you for bringing expungeDeletes to my attention. On Sat, Nov 21, 2009 at 7:46 AM, Yonik Seeley

Re: Solr 1.3 query and index perf tank during optimize

2009-11-21 Thread Yonik Seeley
On Sat, Nov 21, 2009 at 12:33 AM, Lance Norskog goks...@gmail.com wrote: And, terms whose documents have been deleted are not purged. So, you can merge all you like and the index will not shrink back completely. Under what conditions? Certainly not all, since I just tried a simple test and a

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Michael
Hoss, Using Solr 1.4, I see constant index growth until an optimize. I commit (hundreds of updates) every 5 minutes and have a mergefactor of 10, but every 50 minutes I don't see the index collapse down to its original size -- it's slightly larger. Over the course of a week, the index grew from

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Yonik Seeley
On Fri, Nov 20, 2009 at 12:24 PM, Michael solrco...@gmail.com wrote: So -- I thought I understood you to mean that if I frequently merge, it's basically the same as an optimize, and cruft will get purged.  Am I misunderstanding you? That only applies to the segments involved in the merge. The

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Michael
On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Nov 20, 2009 at 12:24 PM, Michael solrco...@gmail.com wrote: So -- I thought I understood you to mean that if I frequently merge, it's basically the same as an optimize, and cruft will get purged.  Am I

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Yonik Seeley
On Fri, Nov 20, 2009 at 2:32 PM, Michael solrco...@gmail.com wrote: On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Nov 20, 2009 at 12:24 PM, Michael solrco...@gmail.com wrote: So -- I thought I understood you to mean that if I frequently merge, it's

Re: Solr 1.3 query and index perf tank during optimize

2009-11-20 Thread Lance Norskog
And, terms whose documents have been deleted are not purged. So, you can merge all you like and the index will not shrink back completely. Only an optimize will remove the orphan terms. This is important because the orphan terms affect relevance calculations. So you really want to purge them with

Re: Solr 1.3 query and index perf tank during optimize

2009-11-17 Thread Chris Hostetter
: Basically, search entries are keyed to other documents. We have finite : storage, : so we purge old documents. My understanding was that deleted documents : still : take space until an optimize is done. Therefore, if I don't optimize, the : index : size on disk will grow without bound. : :

Re: Solr 1.3 query and index perf tank during optimize

2009-11-17 Thread Israel Ekpo
On Tue, Nov 17, 2009 at 2:24 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Basically, search entries are keyed to other documents. We have finite : storage, : so we purge old documents. My understanding was that deleted documents : still : take space until an optimize is done.

Re: Solr 1.3 query and index perf tank during optimize

2009-11-16 Thread Jerome L Quinn
Otis Gospodnetic otis_gospodne...@yahoo.com wrote on 11/13/2009 11:15:43 PM: Let's take a step back. Why do you need to optimize? You said: As long as I'm not optimizing, search and indexing times are satisfactory. :) You don't need to optimize just because you are continuously adding

Re: Solr 1.3 query and index perf tank during optimize

2009-11-16 Thread Otis Gospodnetic
- Original Message From: Jerome L Quinn jlqu...@us.ibm.com To: solr-user@lucene.apache.org Sent: Mon, November 16, 2009 10:05:55 AM Subject: Re: Solr 1.3 query and index perf tank during optimize Otis Gospodnetic wrote on 11/13/2009 11:15:43 PM: Let's take a step back

Re: Solr 1.3 query and index perf tank during optimize

2009-11-14 Thread Lance Norskog
Good question! The terms in the deleted documents are left behind, and so the relevance behavior will be off. The other space used directly by documents will be reabsorbed. (??) On Sat, Nov 14, 2009 at 1:28 PM, Jerome L Quinn jlqu...@us.ibm.com wrote: Lance Norskog goks...@gmail.com wrote on

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless
I think we sorely need a Directory impl that down-prioritizes IO performed by merging. It would be wonderful if from Java we could simply set a per-thread IO priority, but, it'll be a looong time until that's possible. So I think for now we should make a Directory impl that emulates such

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless
Another thing to try, is reducing the maxThreadCount for ConcurrentMergeScheduler. It defaults to 3, which I think is too high -- we should change this default to 1 (I'll open a Lucene issue). Mike On Thu, Nov 12, 2009 at 6:30 PM, Jerome L Quinn jlqu...@us.ibm.com wrote: Hi, everyone, this is

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Michael McCandless
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless luc...@mikemccandless.com wrote: I think we sorely need a Directory impl that down-prioritizes IO performed by merging. Presumably this prioritizing Directory impl could wrap/decorate any existing Directory. Mike

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn
Mark Miller markrmil...@gmail.com wrote on 11/12/2009 07:18:03 PM: Ah, the pains of optimization. Its kind of just how it is. One solution is to use two boxes and replication - optimize on the master, and then queries only hit the slave. Out of reach for some though, and adds many

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM: On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless luc...@mikemccandless.com wrote: I think we sorely need a Directory impl that down-prioritizes IO performed by merging. It's unclear if this case is caused by IO contention, or the OS

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Jerome L Quinn
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM: On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless luc...@mikemccandless.com wrote: I think we sorely need a Directory impl that down-prioritizes IO performed by merging. It's unclear if this case is caused by IO contention, or the OS

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Otis Gospodnetic
://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Jerome L Quinn jlqu...@us.ibm.com To: solr-user@lucene.apache.org Sent: Thu, November 12, 2009 6:30:42 PM Subject: Solr 1.3 query and index perf tank during optimize

Re: Solr 1.3 query and index perf tank during optimize

2009-11-13 Thread Lance Norskog
The 'maxSegments' feature is new with 1.4. I'm not sure that it will cause any less disk I/O during optimize. The 'mergeFactor=2' idea is not what you think: in this case the index is always mostly optimized, so you never need to run optimize. Indexing is always slower, because you amortize the

Solr 1.3 query and index perf tank during optimize

2009-11-12 Thread Jerome L Quinn
Hi, everyone, this is a problem I've had for quite a while, and have basically avoided optimizing because of it. However, eventually we will get to the point where we must delete as well as add docs continuously. I have a Solr 1.3 index with ~4M docs at around 90G. This is a single instance

Re: Solr 1.3 query and index perf tank during optimize

2009-11-12 Thread Mark Miller
Jerome L Quinn wrote: Hi, everyone, this is a problem I've had for quite a while, and have basically avoided optimizing because of it. However, eventually we will get to the point where we must delete as well as add docs continuously. I have a Solr 1.3 index with ~4M docs at around 90G.