Oops, you're right, term listings and counts for deleted docs are
adjusted during merges. I had the impression that optimize had some
special powers here that merge does not.
Thank you for bringing expungeDeletes to my attention.
On Sat, Nov 21, 2009 at 7:46 AM, Yonik Seeley
On Sat, Nov 21, 2009 at 12:33 AM, Lance Norskog goks...@gmail.com wrote:
And, terms whose documents have been deleted are not purged. So, you
can merge all you like and the index will not shrink back completely.
Under what conditions? Certainly not all, since I just tried a simple
test and a
Hoss,
Using Solr 1.4, I see constant index growth until an optimize. I
commit (hundreds of updates) every 5 minutes and have a mergefactor of
10, but every 50 minutes I don't see the index collapse down to its
original size -- it's slightly larger.
Over the course of a week, the index grew from
On Fri, Nov 20, 2009 at 12:24 PM, Michael solrco...@gmail.com wrote:
So -- I thought I understood you to mean that if I frequently merge,
it's basically the same as an optimize, and cruft will get purged. Am
I misunderstanding you?
That only applies to the segments involved in the merge. The
On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Fri, Nov 20, 2009 at 12:24 PM, Michael solrco...@gmail.com wrote:
So -- I thought I understood you to mean that if I frequently merge,
it's basically the same as an optimize, and cruft will get purged. Am
I
On Fri, Nov 20, 2009 at 2:32 PM, Michael solrco...@gmail.com wrote:
On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Fri, Nov 20, 2009 at 12:24 PM, Michael solrco...@gmail.com wrote:
So -- I thought I understood you to mean that if I frequently merge,
it's
And, terms whose documents have been deleted are not purged. So, you
can merge all you like and the index will not shrink back completely.
Only an optimize will remove the orphan terms.
This is important because the orphan terms affect relevance
calculations. So you really want to purge them with
: Basically, search entries are keyed to other documents. We have finite
: storage,
: so we purge old documents. My understanding was that deleted documents
: still
: take space until an optimize is done. Therefore, if I don't optimize, the
: index
: size on disk will grow without bound.
:
:
On Tue, Nov 17, 2009 at 2:24 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:
: Basically, search entries are keyed to other documents. We have finite
: storage,
: so we purge old documents. My understanding was that deleted documents
: still
: take space until an optimize is done.
Otis Gospodnetic otis_gospodne...@yahoo.com wrote on 11/13/2009 11:15:43
PM:
Let's take a step back. Why do you need to optimize? You said: As
long as I'm not optimizing, search and indexing times are
satisfactory. :)
You don't need to optimize just because you are continuously adding
- Original Message
From: Jerome L Quinn jlqu...@us.ibm.com
To: solr-user@lucene.apache.org
Sent: Mon, November 16, 2009 10:05:55 AM
Subject: Re: Solr 1.3 query and index perf tank during optimize
Otis Gospodnetic wrote on 11/13/2009 11:15:43
PM:
Let's take a step back
Good question!
The terms in the deleted documents are left behind, and so the
relevance behavior will be off. The other space used directly by
documents will be reabsorbed. (??)
On Sat, Nov 14, 2009 at 1:28 PM, Jerome L Quinn jlqu...@us.ibm.com wrote:
Lance Norskog goks...@gmail.com wrote on
I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.
It would be wonderful if from Java we could simply set a per-thread
IO priority, but, it'll be a looong time until that's possible.
So I think for now we should make a Directory impl that emulates such
Another thing to try, is reducing the maxThreadCount for
ConcurrentMergeScheduler.
It defaults to 3, which I think is too high -- we should change this
default to 1 (I'll open a Lucene issue).
Mike
On Thu, Nov 12, 2009 at 6:30 PM, Jerome L Quinn jlqu...@us.ibm.com wrote:
Hi, everyone, this is
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
luc...@mikemccandless.com wrote:
I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.
Presumably this prioritizing Directory impl could wrap/decorate any
existing Directory.
Mike
Mark Miller markrmil...@gmail.com wrote on 11/12/2009 07:18:03 PM:
Ah, the pains of optimization. Its kind of just how it is. One solution
is to use two boxes and replication - optimize on the master, and then
queries only hit the slave. Out of reach for some though, and adds many
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM:
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
luc...@mikemccandless.com wrote:
I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.
It's unclear if this case is caused by IO contention, or the OS
ysee...@gmail.com wrote on 11/13/2009 09:06:29 AM:
On Fri, Nov 13, 2009 at 6:27 AM, Michael McCandless
luc...@mikemccandless.com wrote:
I think we sorely need a Directory impl that down-prioritizes IO
performed by merging.
It's unclear if this case is caused by IO contention, or the OS
://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
From: Jerome L Quinn jlqu...@us.ibm.com
To: solr-user@lucene.apache.org
Sent: Thu, November 12, 2009 6:30:42 PM
Subject: Solr 1.3 query and index perf tank during optimize
The 'maxSegments' feature is new with 1.4. I'm not sure that it will
cause any less disk I/O during optimize.
The 'mergeFactor=2' idea is not what you think: in this case the index
is always mostly optimized, so you never need to run optimize.
Indexing is always slower, because you amortize the
Hi, everyone, this is a problem I've had for quite a while,
and have basically avoided optimizing because of it. However,
eventually we will get to the point where we must delete as
well as add docs continuously.
I have a Solr 1.3 index with ~4M docs at around 90G. This is a single
instance
Jerome L Quinn wrote:
Hi, everyone, this is a problem I've had for quite a while,
and have basically avoided optimizing because of it. However,
eventually we will get to the point where we must delete as
well as add docs continuously.
I have a Solr 1.3 index with ~4M docs at around 90G.
22 matches
Mail list logo