Reitzel, Charles <charles.reit...@tiaa-cref.org> wrote: > Question, Toke: in your "immutable" cases, don't the benefits of > optimizing come mostly from eliminating deleted records?
Not for us. We have about 1 deleted document for every 1000 or 10.000 standard documents. > Is there any material difference in heap, CPU, etc. between 1, 5 or 10 > segments? > I.e. at how many segments/shard do you see a noticeable performance hit? It really is either 1 or more than 1 segment, coupled with 0 deleted records or more than 0. Having 1 segment means that String faceting benefits from not having to map between segment ordinals and global ordinals. That's a speed increase (just a null check instead of a memory lookup) as well as a heap requirement reduction: We save 2GB+ heap per shard on that account (our current heap size is 8GB). Granted, we facet on 600M values for one of the fields, which I don't think is very common. 0 deleted records is related as the usual bitmap of deleted documents is null, meaning faster checks. Most of the performance benefit probably comes from the freed memory. We have 25 shards/machine, so sparing 2GB gives us an extra 50GB of disk cache. The performance increase for that is 20-40%, guesstimated from some previous tests where we varied the disk cache size. I doubt that there is much difference between 2, 5, 10 or even 20 segments. The persons at UKWA are running some tests on different degrees of optimization of their 30 shard TB-class index. You'll have to dig a bit, but there might be relevant results: https://github.com/ukwa/shine/tree/master/python/test-logs > Also, I curious if you have experimented much with the maxMergedSegmentMB > and reclaimDeletesWeight properties of the TieredMergePolicy? I have zero experience with that: We build the shards one at a time and don't touch them after that. 90% of our building power goes to Tika analysis, so there hasn't been a apparent need for tuning Solr's indexing. - Toke Eskildsen