On 2/3/2020 5:17 PM, ChienHua wrote:
What should we expect the query performance impacted by splitting one
collection into more shards?
We expect the query performance would degrade by splitting more shards since
the overhead of merging results from several shards.
However, the test result seems not as we expect. Any idea or experience for
the performance impact?
This is a often misunderstood aspect of Solr performance.
In situations with a very high query rate, splitting into shards is
generally going to reduce performance. This happens because as you
mentioned, there is overhead from merging the results. A high query
rate will keep all the CPUs very busy.
But in situations with a low query rate, more shards can actually make
things faster. This is a possibility when there is a significant
surplus of available CPU capacity ... the subqueries for one query can
complete concurrently, so even with the overhead of merging, the overall
result is faster.
The size of the index can also affect this dynamic. If you take an
index that is way too big for a single machine and split it so it has
shards on multiple machines, that can improve query performance
dramatically.
Thanks,
Shawn