You can index and optimize at the same time.  The current limitation
or pause is when the ram buffer is flushing to disk, however that's
changing with the DocumentsWriterPerThread implementation, eg,
LUCENE-2324.

On Tue, Apr 12, 2011 at 8:34 AM, Shawn Heisey <s...@elyograg.org> wrote:
> On 4/12/2011 6:21 AM, stockii wrote:
>>
>> Hello.
>>
>> When is start an optimize (which takes more than 4 hours) no updates from
>> DIH are possible.
>> i thougt solr is copy the hole index and then start an optimize from the
>> copy and not lock the index and optimize this ... =(
>>
>> any way to do both in the same time ?
>
> You can't index and optimize at the same time, and I'm pretty sure that
> there isn't any way to make it possible that wouldn't involve a major
> rewrite of Lucene, and possibly Solr.  The devs would have to say
> differently if my understanding is wrong.
>
> The optimize takes place at the Lucene level.  I can't give you much
> in-depth information, but I can give you some high level stuff.  What it's
> doing is equivalent to a merge, down to one segment.  This is not the same
> as a straight file copy.  It must read the entire Lucene data structure and
> build a new one from scratch.  The process removes deleted documents and
> will also upgrade the version number of the index if it was written with an
> older version of Lucene.  It's very likely that the reading side of the
> process is nearly as comprehensive as the CheckIndex program, but it also
> has to write out a new index segment.
>
> The net result -- the process gives your CPU and especially your I/O
> subsystem a workout, simultaneously.  If you were to make your I/O subsystem
> faster, you would probably see a major improvement in your optimize times.
>
> On my installation, it takes about 11 minutes to optimize one my 16GB
> shards, each with 9 million docs.  These live in virtual machines that are
> stored on a six-drive RAID10 array using 7200RPM SATA disks.  One of my
> pie-in-the-sky upgrade dreams is to replace that with a four-drive RAID10
> array using SSD, the other two drives would be regular SATA -- a mirrored OS
> partition.
>
> Thanks,
> Shawn
>
>

Reply via email to