Hi Markus, Just as a data point for a very large sharded index, we have the full text of 9.3 million books with an index size of about 6+ TB spread over 12 shards on 4 machines. Each machine has 3 shards. The size of each shard ranges between 475GB and 550GB. We are definitely I/O bound. Our machines have 144GB of memory with about 16GB dedicated to the tomcat instance running the 3 Solr instances, which leaves about 120 GB (or 40GB per shard) for the OS disk cache. We release a new index every morning and then warm the caches with several thousand queries. I probably should add that our disk storage is a very high performance Isilon appliance that has over 500 drives and every block of every file is striped over no less than 14 different drives. (See blog for details *)
We have a very low number of queries per second (0.3-2 qps) and our modest response time goal is to keep 99th percentile response time for our application (i.e. Solr + application) under 10 seconds. Our current performance statistics are: average response time 300 ms median response time 113 ms 90th percentile 663 ms 95th percentile 1,691 ms We had plans to do some performance testing to determine the optimum shard size and optimum number of shards per machine, but that has remained on the back burner for a long time as other higher priority items keep pushing it down on the todo list. We would be really interested to hear about the experiences of people who have so many shards that the overhead of distributing the queries, and consolidating/merging the responses becomes a serious issue. Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search * http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-500000-volumes-5-million-volumes-and-beyond -----Original Message----- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Tuesday, August 02, 2011 12:33 PM To: solr-user@lucene.apache.org Subject: Re: performance crossover between single index and sharding Actually, i do worry about it. Would be marvelous if someone could provide some metrics for an index of many terabytes. > [..] At some extreme point there will be diminishing > returns and a performance decrease, but I wouldn't worry about that at all > until you've got many terabytes -- I don't know how many but don't worry > about it. > > ~ David > > ----- > Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book > -- > View this message in context: > http://lucene.472066.n3.nabble.com/performance-crossover-between-single-in > dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User mailing > list archive at Nabble.com.