Re: performance crossover between single index and sharding

Shawn Heisey Wed, 03 Aug 2011 07:34:05 -0700

Replies inline.

On 8/3/2011 2:24 AM, Bernd Fehling wrote:

To show that I compare apples and oranges here are my previous FASTSearch setup:

- one master server (controlling, logging, search dispatcher)
- six index server (4.25 mio docs per server, 5 slices per index)

(searching and indexing at the same time, indexing once per weekduring the weekend)

- each server has 4GB RAM, all servers are physical on seperate machines
- RAM usage controlled by the processes
- total of 25.5 mio. docs (mainly metadata) from 1500 databases worldwide
- index size is about 67GB per indexer --> about 402GB total
- about 3 qps at peek times
- with average search time of 0.05 seconds at peek times

An average query time of 50 milliseconds isn't too bad. If the numberfrom your Solr setup below (39.5) is the QTime, then Solr thinks it isperforming better, but Solr's QTime does not include absolutelyeverything that hs to happen. Do you by chance have 95th and 99thpercentile query times for either system?

And here is now my current Solr setup:
- one master server (indexing only)

- two slave server (search only) but only one is online, the second isfallback

- each server has 32GB RAM, all server are virtuell

(master on a seperate physical machine, both slaves together on aphysical machine)

- RAM usage is currently 20GB to java heap
- total of 31 mio. docs (all metadata) from 2000 databases worldwide
- index size is 156GB total
- search handler statistic report 0.6 average requests per second
- average time per request 39.5 (is that seconds?)
- building the index from scratch takes about 20 hours

I can't tell whether you mean that each physical host has 32GB or eachVM has 32GB. You want to be sure that you are not oversubscribing yourmemory. If you can get more memory in your machines, you reallyshould. Do you know whether that 0.6 seconds is most of the delay thata user sees when making a search request, or are there other thingsgoing on that contribute more delay? In our webapp, the Solr requesttime is usually small compared with everything else the server and theuser's browser are doing to render the results page. As much as I hatebeing the tall pole in the tent, I look forward to the day when thedevelopers can change that balance.

The good thing is I have the ability to compare a commercial product and
enterprise system to open source.
I started with my simple Solr setup because of "kiss" (keep it simpleand stupid).Actually it is doing excellent as single index on a single virtuellserver.But the average time per request should be reduced now, thats why Istarted
this discussion.
While searches with smaller Solr index size (3 mio. docs) showed thatit can
stand with FAST Search it now shows that its time to go with sharding.
I think we are already far behind the point of search performancecrossover.
What I hope to get with sharding:
- reduce time for building the index
- reduce average time per request

You will probably achieve both of these things by sharding, especiallyif you have a lot of CPU cores available. Like mine, your query volumeis very low, so the CPU cores are better utilized distributing the search.

What I fear with sharding:

- i currently have master/slave, do I then have e.g. 3 master and 3slaves?

- the query changes because of sharding (is there a search distributor?)
- how to distribute the content the indexer with DIH on 3 server?
- anything else to think about while changing to sharding?

I think sharding is probably a good idea for you, as long as you don'tlose redundancy. You can duplicate the FAST concept of a master server,in a Solr core with no index. The solrconfig.xml for the core needs toinclude the shards parameter. That core combined with those shards willmake up one complete index chain, and you need to have at least twocomplete chains, running on separate physical hardware. A load balancerwill be critical. I use two small VMs on separate hosts with heartbeatand haproxy for mine.


Thanks,
Shawn

Re: performance crossover between single index and sharding

Reply via email to