This is the current config:

        <indexConfig>
                <ramBufferSizeMB>100</ramBufferSizeMB>
                <writeLockTimeout>10000</writeLockTimeout>
                <mergeScheduler 
class="org.apache.lucene.index.ConcurrentMergeScheduler" />
                <mergePolicyFactory 
class="org.apache.solr.index.TieredMergePolicyFactory">
                        <int name="maxMergeAtOnce">10</int>
                        <int name="segmentsPerTier">10</int>
                </mergePolicyFactory>
        </indexConfig>

We index in bulk, so after indexing about 4 million documents over a week (OCR 
takes long) we normally end up with about 60-70 segments with this 
configuration.

> On 3 Mar 2017, at 02:42, Alexandre Rafalovitch <arafa...@gmail.com> wrote:
> 
> What do you have for merge configuration in solrconfig.xml? You should
> be able to tune it to - approximately - whatever you want without
> doing the grand optimize:
> https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments
> 
> Regards,
>   Alex.
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
> 
> 
> On 2 March 2017 at 16:37, Caruana, Matthew <mcaru...@icij.org> wrote:
>> Yes, we already do it outside Solr. See https://github.com/ICIJ/extract 
>> which we developed for this purpose. My guess is that the documents are very 
>> large, as you say.
>> 
>> Optimising was always an attempt to bring down the number of segments from 
>> 60+. Not sure how else to do that.
>> 
>>> On 2 Mar 2017, at 7:42 pm, Michael Joyner <mich...@newsrx.com> wrote:
>>> 
>>> You can solve the disk space and time issues by specifying multiple 
>>> segments to optimize down to instead of a single segment.
>>> 
>>> When we reindex we have to optimize or we end up with hundreds of segments 
>>> and very horrible performance.
>>> 
>>> We optimize down to like 16 segments or so and it doesn't do the 3x disk 
>>> space thing and usually runs in a decent amount of time. (we have >50 
>>> million articles in one of our solr indexes).
>>> 
>>> 
>>>> On 03/02/2017 10:20 AM, David Hastings wrote:
>>>> Agreed, and since it takes three times the space is part of the reason it
>>>> takes so long, so that 190gb index ends up writing another 380 gb until it
>>>> compresses down and deletes the two left over files.  its a pretty hefty
>>>> operation
>>>> 
>>>> On Thu, Mar 2, 2017 at 10:13 AM, Alexandre Rafalovitch <arafa...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Optimize operation is no longer recommended for Solr, as the
>>>>> background merges got a lot smarter.
>>>>> 
>>>>> It is an extremely expensive operation that can require up to 3-times
>>>>> amount of disk during the processing.
>>>>> 
>>>>> This is not to say yours is a valid question, which I am leaving to
>>>>> others to respond.
>>>>> 
>>>>> Regards,
>>>>>   Alex.
>>>>> ----
>>>>> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>>>>> 
>>>>> 
>>>>>> On 2 March 2017 at 10:04, Caruana, Matthew <mcaru...@icij.org> wrote:
>>>>>> I’m currently performing an optimise operation on a ~190GB index with
>>>>> about 4 million documents. The process has been running for hours.
>>>>>> This is surprising, because the machine is an EC2 r4.xlarge with four
>>>>> cores and 30GB of RAM, 24GB of which is allocated to the JVM.
>>>>>> The load average has been steady at about 1.3. Memory usage is 25% or
>>>>> less the whole time. iostat reports ~6% util.
>>>>>> What gives?
>>>>>> 
>>>>>> Running Solr 6.4.1.
>>> 

Reply via email to