Hi Markus,

Just as a data point for a very large sharded index, we have the full text of 
9.3 million books with an index size of about 6+ TB spread over 12 shards on 4 
machines. Each machine has 3 shards. The size of each shard ranges between 
475GB and 550GB.  We are definitely I/O bound. Our machines have 144GB of 
memory with about 16GB dedicated to the tomcat instance running the 3 Solr 
instances, which leaves about 120 GB (or 40GB per shard) for the OS disk cache. 
 We release a new index every morning and then warm the caches with several 
thousand queries.  I probably should add that our disk storage is a very high 
performance Isilon appliance that has over 500 drives and every block of every 
file is striped over no less than 14 different drives. (See blog for details *)

We have a very low number of queries per second (0.3-2 qps) and our modest 
response time goal is to keep 99th percentile response time for our application 
(i.e. Solr + application) under 10 seconds.

Our current performance statistics are:

average response time  300 ms
median response time   113 ms
90th percentile        663 ms    
95th percentile        1,691 ms

We had plans to do some performance testing to determine the optimum shard size 
and optimum number of shards per machine, but that has remained on the back 
burner for a long time as other higher priority items keep pushing it down on 
the todo list.

We would be really interested to hear about the experiences of people who have 
so many shards that the overhead of distributing the queries, and 
consolidating/merging the responses becomes a serious issue.


Tom Burton-West

http://www.hathitrust.org/blogs/large-scale-search

* 
http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-500000-volumes-5-million-volumes-and-beyond

-----Original Message-----
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Tuesday, August 02, 2011 12:33 PM
To: solr-user@lucene.apache.org
Subject: Re: performance crossover between single index and sharding

Actually, i do worry about it. Would be marvelous if someone could provide 
some metrics for an index of many terabytes.

> [..] At some extreme point there will be diminishing
> returns and a performance decrease, but I wouldn't worry about that at all
> until you've got many terabytes -- I don't know how many but don't worry
> about it.
> 
> ~ David
> 
> -----
>  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/performance-crossover-between-single-in
> dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User mailing
> list archive at Nabble.com.

Reply via email to