Re: What is the bottleneck for an optimise operation?

Shawn Heisey Thu, 02 Mar 2017 18:04:04 -0800

On 3/2/2017 8:04 AM, Caruana, Matthew wrote:
> I’m currently performing an optimise operation on a ~190GB index with about 4 
> million documents. The process has been running for hours.
>
> This is surprising, because the machine is an EC2 r4.xlarge with four cores 
> and 30GB of RAM, 24GB of which is allocated to the JVM.
>
> The load average has been steady at about 1.3. Memory usage is 25% or less 
> the whole time. iostat reports ~6% util.
>
> What gives?


On one of my systems, which is running 6.3.0, the optimize of a 50GB
index takes 1.73 hours.  This system has very fast storage -- six SATA
drives in RAID10.  It machine has eight 2.8Ghz CPU cores (two Intel
E5440), 64GB of memory, a 13GB heap, and over 700GB of total index
data.  I would like more memory, but the machine is maxed, and this is
the dev server, which doesn't need to perform as well as production.

As others have said, an optimize rewrites the whole index.  The optimize
does NOT proceed at full disk I/O rate, though.  The speed of the disks
has very little influence on optimize speed unless they are REALLY
slow.  Any modern disk should be able to keep up easily.

It's not just a straight copy of data.  Lucene must do a very heavy
amount of data processing.  Except for the fact that the source data is
a little bit different and text analysis does not need to be repeated,
what Lucene ends up doing is a lot like the initial indexing process. 
All existing data must be examined, minus deleted documents.  A new term
list for the optimized segment (which covers the ENTIRE index dataset)
must be built.  This will be a significant portion of the total size,
and is likely to be millions or billions of terms for an index that size
-- this takes time to process.  The rest of the files that make up an
index segment also require some significant processing to rewrite to a
new segment.

An optimize is a purely Lucene operation, so I do not know whether the
bug in Solr 6.4.1 that causes high CPU usage affects it directly.  It
CAN definitely affect it indirectly, by the making the CPU less available.

https://issues.apache.org/jira/browse/SOLR-10130

The statement you might have heard that an optimize can take 3x the
space is sort of true, but not in the way that most think.  It is true
that an optimize might result in total space consumption that's
temporarily three times the *final* index size, but when looking at the
*starting* index size, the most it should ever take is double.  It is a
good idea to always plan on having 3x the disk space for your index,
though.  There are certain situations that you can experience, even
during normal operation when not running an optimize, where the index
can grow to triple size before it shrinks.

Another issue which might be relevant:  Assuming that this 190GB index
is the only index on the machine, you've only left yourself with 6GB of
RAM to cache 190GB of index data.  This will be even less if the server
has other software running on it.  That's not enough RAM for good
general performance.  If the machine has more indexes than just this
190GB, then the situation is even worse.

General performance will get worse during an optimize on most systems. 
I cannot say for sure that having too little system memory for good
caching will cause an optimize to be very slow.  I think the processing
involved might make it somewhat immune to that particular problem, if
the optimize is the only thing the server is doing.  If the server is
busy with queries and/or indexing during an optimize, I would expect a
very low memory situation like that to slow EVERYTHING down.

https://wiki.apache.org/solr/SolrPerformanceProblems

On my 6.3.0 dev system, optimizing a 190GB index would take more than
six hours.  Running with memory so low and on 6.4.1 with its CPU bug, it
might take even longer.

The 6.4.2 release that fixes the performance bug should be available
sometime in the next week or so.

Thanks,
Shawn

Re: What is the bottleneck for an optimise operation?

Reply via email to