Re: Optimize taking two steps and extra disk space

Michael McCandless Tue, 21 Jun 2011 02:52:59 -0700

OK that sounds like a good solution!

You can also have CMS limit how many merges are allowed to run at
once, if your IO system has trouble w/ that much concurrency.


Mike McCandless

http://blog.mikemccandless.com

On Mon, Jun 20, 2011 at 6:29 PM, Shawn Heisey <s...@elyograg.org> wrote:
> On 6/20/2011 3:18 PM, Michael McCandless wrote:
>>
>> With segmentsPerTier at 35 you will easily cross 70 segs in the index...
>> If you want optimize to run in a single merge, I would lower
>> sementsPerTier and mergeAtOnce (maybe back to the 10 default), and set
>> your maxMergeAtOnceExplicit to 70 or higher...
>>
>> Lower mergeAtOnce means merges run more frequently but for shorter
>> time, and, your searching should be faster (than 35/35) since there
>> are fewer segments to visit.
>
> Thanks again for more detailed information.  There is method to my madness,
> which I will now try to explain.
>
> With a value of 10, the reindex involves enough merges that there is are
> many second level merges, and a third-level merge.  I was running into
> situations on my development platform (with its slow disks) where there were
> three merges happening at the same time, which caused all indexing activity
> to cease for several minutes.  This in turn would cause JDBC to time out and
> drop the connection to the database, which caused DIH to fail and rollback
> the entire import about two hours (two thirds) in.
>
> With a mergeFactor of 35, there are no second level merges, and no
> third-level merges.  I can do a complete reindex successfully even on a
> system with slow disks.
>
> In production, one shard (out of six) is optimized every day to eliminate
> deleted documents.  When I have to reindex everything, I will typically go
> through and manually optimize each shard in turn after it's done.  This is
> the point where I discovered this two-pass problem.
>
> I don't want to do a full-import with optimize=true, because all six large
> shards build at the same time in a Xen environment.  The I/O storm that
> results from three optimizes happening on each host at the same time and
> then replicating to similar Xen hosts is very bad.
>
> I have now set maxMergeAtOnceExplicit to 105.  I think that is probably
> enough, given that that I currently do not experience any second level
> merges.  When my index gets big enough, I will increase the ram buffer.  By
> then I will probably have more memory, so the first-level merges can still
> happen entirely from I/O cache.
>
> Shawn
>
>

Re: Optimize taking two steps and extra disk space

Reply via email to