Hi, I found that in the IndexMergeTool.java, we found that there is this line which set the maxNumSegments to 1
writer.forceMerge(1); For this, does it means that there will always be only 1 segment after the merging? Is there any way which we can allow the merging to be in multiple segment, which each segment of a certain size? Like if we want each segment to be of 20GB? Regards, Edwin On 23 November 2017 at 20:35, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Hi Shawn, > > Thanks for the info. We will most likely be doing sharding when we migrate > to Solr 7.1.0, and re-index the data. > > But as Solr 7.1.0 is still not ready to index EML files yet due to this > JIRA, https://issues.apache.org/jira/browse/SOLR-11622, we have to make > use with our current Solr 6.5.1 first, which was already created without > sharding from the start. > > Regards, > Edwin > > On 23 November 2017 at 12:50, Shawn Heisey <apa...@elyograg.org> wrote: > >> On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote: >> >>> I'm doing the merging on the SSD drive, the speed should be ok? >>> >> >> The speed of virtually all modern disks will have almost no influence on >> the speed of the merge. The bottleneck isn't disk transfer speed, it's the >> operation of the merge code in Lucene. >> >> As I said earlier in this thread, a merge is **NOT** just a copy. Lucene >> must completely rebuild the data structures of the index to incorporate all >> of the segments of the source indexes into a single segment in the target >> index, while simultaneously *excluding* information from documents that >> have been deleted. >> >> The best speed I have ever personally seen for a merge is 30 megabytes >> per second. This is far below the sustained transfer rate of a typical >> modern SATA disk. SSD is capable of far faster data transfer ...but it >> will NOT make merges go any faster. >> >> We need to merge because the data are indexed in two different >>> collections, >>> and we need them to be under the same collection, so that we can do >>> things >>> like faceting more accurately. >>> Will sharding alone achieve this? Or do we have to merge first before we >>> do >>> the sharding? >>> >> >> If you want the final index to be sharded, it's typically best to index >> from scratch into a new empty collection that has the number of shards you >> want. The merging tool you're using isn't aware of concepts like shards. >> It combines everything into a single index. >> >> It's not entirely clear what you're asking with the question about >> sharding alone. Making a guess: I have never heard of facet accuracy >> being affected by whether or not the index is sharded. If that *is* >> possible, then I would expect an index that is NOT sharded to have better >> accuracy. >> >> Thanks, >> Shawn >> >> >