TLOG

Shawn Heisey Fri, 04 Jan 2019 09:34:14 -0800

On 1/3/2019 11:26 PM, Doss wrote:

We are planning to setup a SOLR cloud with 6 nodes for 3 million records
(expected to grow to 5 million in a year), with 150 fields and over all
index would come around 120GB.


We plan to use NRT with 5 sec soft commit and 1 min hard commit.

Five seconds is likely far too short an interval. That's somethingyou'll have to experiment with.

Expected query volume would be 5000 select hits per second and 7000 inserts
/ updates per second.

5000 queries per second is an extremely high query rate. I would guessthat six nodes is far too few to handle that much of a query load. Itmight also be plenty ... it's nearly impossible to gauge that with theinformation you've shared so far. Usually the only way to find out forsure is to actually BUILD the system and try it.

7000 documents inserted per second is also ambitious. It's achievable,but is almost certainly going to require parallel threads/processesindexing at the same time. That's going to reduce the query volume youcan handle.

If you expect 3 million documents to reach 120GB of index size, theneach of those documents must be fairly large. Large documents willindex more slowly, and can also reduce query capacity.

Memory will be your biggest challenge. If a Solr instance must handle120GB of index and achieve a high query volume, then you'll want thatSolr instance to have about 128GB of memory, so the entire index willfit into the operating system disk cache.

Our records can be classified under 15 categories, but they will not have
even number of records, few categories will have more number of records.

Queries will also come in the same pattern, that is., categories with high
number of records will get high volume of select / updates.

For this situation we are confused in choosing what type of sharding would
help us in better performance in both select and updates?

Composite / implicit - Composite with 15 shards or implicit based on 15
categories.

15 shards is probably far too many for only a few million documents,especially with the extremely high query volume and low host count youhave projected. With a high query volume, you want the absolute minimumnumber of shards possible ... one if you can. Handling several milliondocuments in a single shard is usually doable.

Our select queries will have minimum 15 filters in fq, with extensive
function queries used in sort.

When a query has multiple filters, they will generally all be run inparallel, not sequentially. This can affect the query volume you canhandle, it's very difficult to know whether the effect will be helpfulor harmful.

For our kind of situation which replica Type can we choose? All NRT or NRT
with TLOG ?

If you will only have two replicas, they should both be either NRT orTLOG. With more than two replicas, my suggestion would be to make twoof them TLOG and the rest PULL. One of the TLOG replicas will be electedleader, and all other replicas will copy the index from the leader,rather than do the independent indexing that NRT replicas do.


Thanks,
Shawn

Re: Regarding Shards - Composite / Implicit , Replica Type - NRT / TLOG

Reply via email to