Hi, We are planning to setup a SOLR cloud with 6 nodes for 3 million records (expected to grow to 5 million in a year), with 150 fields and over all index would come around 120GB.
We plan to use NRT with 5 sec soft commit and 1 min hard commit. Expected query volume would be 5000 select hits per second and 7000 inserts / updates per second. Our records can be classified under 15 categories, but they will not have even number of records, few categories will have more number of records. Queries will also come in the same pattern, that is., categories with high number of records will get high volume of select / updates. For this situation we are confused in choosing what type of sharding would help us in better performance in both select and updates? Composite / implicit - Composite with 15 shards or implicit based on 15 categories. Our select queries will have minimum 15 filters in fq, with extensive function queries used in sort. Updates will have 6 integer fields, 5 string fields and 4 string/integer fields with multi valued. If we choose implicit to boost select performance, our updates will be heavy on few shards (major category shards), will this be a problem? For our kind of situation which replica Type can we choose? All NRT or NRT with TLOG ? Thanks in advance! Best, Doss.