Hi,

I found that in the IndexMergeTool.java, we found that there is this line
which set the maxNumSegments to 1

writer.forceMerge(1);


For this, does it means that there will always be only 1 segment after the
merging?

Is there any way which we can allow the merging to be in multiple segment,
which each segment of a certain size? Like if we want each segment to be of
20GB?

Regards,
Edwin


On 23 November 2017 at 20:35, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Hi Shawn,
>
> Thanks for the info. We will most likely be doing sharding when we migrate
> to Solr 7.1.0, and re-index the data.
>
> But as Solr 7.1.0 is still not ready to index EML files yet due to this
> JIRA, https://issues.apache.org/jira/browse/SOLR-11622, we have to make
> use with our current Solr 6.5.1 first, which was already created without
> sharding from the start.
>
> Regards,
> Edwin
>
> On 23 November 2017 at 12:50, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote:
>>
>>> I'm doing the merging on the SSD drive, the speed should be ok?
>>>
>>
>> The speed of virtually all modern disks will have almost no influence on
>> the speed of the merge.  The bottleneck isn't disk transfer speed, it's the
>> operation of the merge code in Lucene.
>>
>> As I said earlier in this thread, a merge is **NOT** just a copy. Lucene
>> must completely rebuild the data structures of the index to incorporate all
>> of the segments of the source indexes into a single segment in the target
>> index, while simultaneously *excluding* information from documents that
>> have been deleted.
>>
>> The best speed I have ever personally seen for a merge is 30 megabytes
>> per second.  This is far below the sustained transfer rate of a typical
>> modern SATA disk.  SSD is capable of far faster data transfer ...but it
>> will NOT make merges go any faster.
>>
>> We need to merge because the data are indexed in two different
>>> collections,
>>> and we need them to be under the same collection, so that we can do
>>> things
>>> like faceting more accurately.
>>> Will sharding alone achieve this? Or do we have to merge first before we
>>> do
>>> the sharding?
>>>
>>
>> If you want the final index to be sharded, it's typically best to index
>> from scratch into a new empty collection that has the number of shards you
>> want.  The merging tool you're using isn't aware of concepts like shards.
>> It combines everything into a single index.
>>
>> It's not entirely clear what you're asking with the question about
>> sharding alone.  Making a guess:  I have never heard of facet accuracy
>> being affected by whether or not the index is sharded.  If that *is*
>> possible, then I would expect an index that is NOT sharded to have better
>> accuracy.
>>
>> Thanks,
>> Shawn
>>
>>
>

Reply via email to