I am using the IndexMergeTool from Solr, from the command below: java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar org.apache.lucene.misc.IndexMergeTool
The heap size is 32GB. There are more than 20 million documents in the two cores. Regards, Edwin On 21 November 2017 at 21:54, Shawn Heisey <apa...@elyograg.org> wrote: > On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote: > >> Does anyone knows how long usually the merging in Solr will take? >> >> I am currently merging about 3.5TB of data, and it has been running for >> more than 28 hours and it is not completed yet. The merging is running on >> SSD disk. >> > > The following will apply if you mean Solr's "optimize" feature when you > say "merging". > > In my experience, merging proceeds at about 20 to 30 megabytes per second > -- even if the disks are capable of far faster data transfer. Merging is > not just copying the data. Lucene is completely rebuilding very large data > structures, and *not* including data from deleted documents as it does so. > It takes a lot of CPU power and time. > > If we average the data rates I've seen to 25, then that would indicate > that an optimize on a 3.5TB is going to take about 39 hours, and might take > as long as 48 hours. And if you're running SolrCloud with multiple > replicas, multiply that by the number of copies of the 3.5TB index. An > optimize on a SolrCloud collection handles one shard replica at a time and > works its way through the entire collection. > > If you are merging different indexes *together*, which a later message > seems to state, then the actual Lucene operation is probably nearly > identical, but I'm not really familiar with it, so I cannot say for sure. > > Thanks, > Shawn > >