Reitzel, Charles <charles.reit...@tiaa-cref.org> wrote:
> Question, Toke: in your "immutable" cases, don't the benefits of
> optimizing come mostly from eliminating deleted records?

Not for us. We have about 1 deleted document for every 1000 or 10.000 standard 
documents.

> Is there any material difference in heap, CPU, etc. between 1, 5 or 10 
> segments?
> I.e. at how many segments/shard do you see a noticeable performance hit?

It really is either 1 or more than 1 segment, coupled with 0 deleted records or 
more than 0.

Having 1 segment means that String faceting benefits from not having to map 
between segment ordinals and global ordinals. That's a speed increase (just a 
null check instead of a memory lookup) as well as a heap requirement reduction: 
We save 2GB+ heap per shard on that account (our current heap size is 8GB). 
Granted, we facet on 600M values for one of the fields, which I don't think is 
very common.

0 deleted records is related as the usual bitmap of deleted documents is null, 
meaning faster checks.

Most of the performance benefit probably comes from the freed memory. We have 
25 shards/machine, so sparing 2GB gives us an extra 50GB of disk cache. The 
performance increase for that is 20-40%, guesstimated from some previous tests 
where we varied the disk cache size.


I doubt that there is much difference between 2, 5, 10 or even 20 segments. The 
persons at UKWA are running some tests on different degrees of optimization of 
their 30 shard TB-class index. You'll have to dig a bit, but there might be 
relevant results: https://github.com/ukwa/shine/tree/master/python/test-logs

> Also, I curious if you have experimented much with the maxMergedSegmentMB
> and reclaimDeletesWeight  properties of the TieredMergePolicy?

I have zero experience with that: We build the shards one at a time and don't 
touch them after that. 90% of our building power goes to Tika analysis, so 
there hasn't been a apparent need for tuning Solr's indexing.

- Toke Eskildsen

Reply via email to