OK that sounds like a good solution! You can also have CMS limit how many merges are allowed to run at once, if your IO system has trouble w/ that much concurrency.
Mike McCandless http://blog.mikemccandless.com On Mon, Jun 20, 2011 at 6:29 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 6/20/2011 3:18 PM, Michael McCandless wrote: >> >> With segmentsPerTier at 35 you will easily cross 70 segs in the index... >> If you want optimize to run in a single merge, I would lower >> sementsPerTier and mergeAtOnce (maybe back to the 10 default), and set >> your maxMergeAtOnceExplicit to 70 or higher... >> >> Lower mergeAtOnce means merges run more frequently but for shorter >> time, and, your searching should be faster (than 35/35) since there >> are fewer segments to visit. > > Thanks again for more detailed information. There is method to my madness, > which I will now try to explain. > > With a value of 10, the reindex involves enough merges that there is are > many second level merges, and a third-level merge. I was running into > situations on my development platform (with its slow disks) where there were > three merges happening at the same time, which caused all indexing activity > to cease for several minutes. This in turn would cause JDBC to time out and > drop the connection to the database, which caused DIH to fail and rollback > the entire import about two hours (two thirds) in. > > With a mergeFactor of 35, there are no second level merges, and no > third-level merges. I can do a complete reindex successfully even on a > system with slow disks. > > In production, one shard (out of six) is optimized every day to eliminate > deleted documents. When I have to reindex everything, I will typically go > through and manually optimize each shard in turn after it's done. This is > the point where I discovered this two-pass problem. > > I don't want to do a full-import with optimize=true, because all six large > shards build at the same time in a Xen environment. The I/O storm that > results from three optimizes happening on each host at the same time and > then replicating to similar Xen hosts is very bad. > > I have now set maxMergeAtOnceExplicit to 105. I think that is probably > enough, given that that I currently do not experience any second level > merges. When my index gets big enough, I will increase the ram buffer. By > then I will probably have more memory, so the first-level merges can still > happen entirely from I/O cache. > > Shawn > >