Re: performance crossover between single index and sharding

Ken Krugler Tue, 02 Aug 2011 11:41:24 -0700

With low qps and multi-core servers, I believe one reason to have multiple 
shards on one server is to provide better parallelism for a request, and thus 
reduce your response time.


-- Ken

On Aug 2, 2011, at 11:06am, Jonathan Rochkind wrote:

> What's the reasoning  behind having three shards on one machine, instead of 
> just combining those into one shard? Just curious.  I had been thinking the 
> point of shards was to get them on different machines, and there'd be no 
> reason to have multiple shards on one machine.
> 
> On 8/2/2011 1:59 PM, Burton-West, Tom wrote:
>> Hi Markus,
>> 
>> Just as a data point for a very large sharded index, we have the full text 
>> of 9.3 million books with an index size of about 6+ TB spread over 12 shards 
>> on 4 machines. Each machine has 3 shards. The size of each shard ranges 
>> between 475GB and 550GB.  We are definitely I/O bound. Our machines have 
>> 144GB of memory with about 16GB dedicated to the tomcat instance running the 
>> 3 Solr instances, which leaves about 120 GB (or 40GB per shard) for the OS 
>> disk cache.  We release a new index every morning and then warm the caches 
>> with several thousand queries.  I probably should add that our disk storage 
>> is a very high performance Isilon appliance that has over 500 drives and 
>> every block of every file is striped over no less than 14 different drives. 
>> (See blog for details *)
>> 
>> We have a very low number of queries per second (0.3-2 qps) and our modest 
>> response time goal is to keep 99th percentile response time for our 
>> application (i.e. Solr + application) under 10 seconds.
>> 
>> Our current performance statistics are:
>> 
>> average response time  300 ms
>> median response time   113 ms
>> 90th percentile        663 ms
>> 95th percentile        1,691 ms
>> 
>> We had plans to do some performance testing to determine the optimum shard 
>> size and optimum number of shards per machine, but that has remained on the 
>> back burner for a long time as other higher priority items keep pushing it 
>> down on the todo list.
>> 
>> We would be really interested to hear about the experiences of people who 
>> have so many shards that the overhead of distributing the queries, and 
>> consolidating/merging the responses becomes a serious issue.
>> 
>> 
>> Tom Burton-West
>> 
>> http://www.hathitrust.org/blogs/large-scale-search
>> 
>> * 
>> http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-500000-volumes-5-million-volumes-and-beyond
>> 
>> -----Original Message-----
>> From: Markus Jelsma [mailto:markus.jel...@openindex.io]
>> Sent: Tuesday, August 02, 2011 12:33 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: performance crossover between single index and sharding
>> 
>> Actually, i do worry about it. Would be marvelous if someone could provide
>> some metrics for an index of many terabytes.
>> 
>>> [..] At some extreme point there will be diminishing
>>> returns and a performance decrease, but I wouldn't worry about that at all
>>> until you've got many terabytes -- I don't know how many but don't worry
>>> about it.
>>> 
>>> ~ David
>>> 
>>> -----
>>>  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/performance-crossover-between-single-in
>>> dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User mailing
>>> list archive at Nabble.com.

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions

Re: performance crossover between single index and sharding

Reply via email to