Really, let's back up here though. This sure seems like an XY problem.
You're merging indexes that will eventually be something on the order
of 3.5TB. I claim that an index of that size is very difficult to work
with effectively. _Why_ do you want to do this? Do you have any
evidence that you'll be able to effectively use it?

And Shawn tells you that the result will be one large segment. If you
replace documents in that index, it will consist of around 3.4975T
wasted space before the segment is merged, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.

You already know that merging is extremely painful. This sure seems
like a case where the evidence is mounting that you would be far
better off sharding and _not_ merging.

FWIW,
Erick

On Wed, Nov 22, 2017 at 8:45 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 11/21/2017 9:10 AM, Zheng Lin Edwin Yeo wrote:
>> I am using the IndexMergeTool from Solr, from the command below:
>>
>> java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar
>> org.apache.lucene.misc.IndexMergeTool
>>
>> The heap size is 32GB. There are more than 20 million documents in the two
>> cores.
>
> I have looked at IndexMergeTool, and confirmed that it does its job in
> exactly the same way that Solr does an optimize, so I would still expect
> a rate of 20 to 30 MB per second, unless it's running on REALLY old
> hardware that can't transfer data that quickly.
>
> Thanks,
> Shawn
>

Reply via email to