This is the current config: <indexConfig> <ramBufferSizeMB>100</ramBufferSizeMB> <writeLockTimeout>10000</writeLockTimeout> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler" /> <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicyFactory> </indexConfig>
We index in bulk, so after indexing about 4 million documents over a week (OCR takes long) we normally end up with about 60-70 segments with this configuration. > On 3 Mar 2017, at 02:42, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > > What do you have for merge configuration in solrconfig.xml? You should > be able to tune it to - approximately - whatever you want without > doing the grand optimize: > https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments > > Regards, > Alex. > ---- > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 2 March 2017 at 16:37, Caruana, Matthew <mcaru...@icij.org> wrote: >> Yes, we already do it outside Solr. See https://github.com/ICIJ/extract >> which we developed for this purpose. My guess is that the documents are very >> large, as you say. >> >> Optimising was always an attempt to bring down the number of segments from >> 60+. Not sure how else to do that. >> >>> On 2 Mar 2017, at 7:42 pm, Michael Joyner <mich...@newsrx.com> wrote: >>> >>> You can solve the disk space and time issues by specifying multiple >>> segments to optimize down to instead of a single segment. >>> >>> When we reindex we have to optimize or we end up with hundreds of segments >>> and very horrible performance. >>> >>> We optimize down to like 16 segments or so and it doesn't do the 3x disk >>> space thing and usually runs in a decent amount of time. (we have >50 >>> million articles in one of our solr indexes). >>> >>> >>>> On 03/02/2017 10:20 AM, David Hastings wrote: >>>> Agreed, and since it takes three times the space is part of the reason it >>>> takes so long, so that 190gb index ends up writing another 380 gb until it >>>> compresses down and deletes the two left over files. its a pretty hefty >>>> operation >>>> >>>> On Thu, Mar 2, 2017 at 10:13 AM, Alexandre Rafalovitch <arafa...@gmail.com> >>>> wrote: >>>> >>>>> Optimize operation is no longer recommended for Solr, as the >>>>> background merges got a lot smarter. >>>>> >>>>> It is an extremely expensive operation that can require up to 3-times >>>>> amount of disk during the processing. >>>>> >>>>> This is not to say yours is a valid question, which I am leaving to >>>>> others to respond. >>>>> >>>>> Regards, >>>>> Alex. >>>>> ---- >>>>> http://www.solr-start.com/ - Resources for Solr users, new and experienced >>>>> >>>>> >>>>>> On 2 March 2017 at 10:04, Caruana, Matthew <mcaru...@icij.org> wrote: >>>>>> I’m currently performing an optimise operation on a ~190GB index with >>>>> about 4 million documents. The process has been running for hours. >>>>>> This is surprising, because the machine is an EC2 r4.xlarge with four >>>>> cores and 30GB of RAM, 24GB of which is allocated to the JVM. >>>>>> The load average has been steady at about 1.3. Memory usage is 25% or >>>>> less the whole time. iostat reports ~6% util. >>>>>> What gives? >>>>>> >>>>>> Running Solr 6.4.1. >>>