Re: Support for huge data set?

Renaud Delbru Fri, 13 May 2011 11:19:02 -0700

Hi,

Our system [1] consists of +220 million semi-structured web documents(RDF, Microformats, etc.), with fairly small documents (a few kb) andlarge documents (a few MB). Each document has in addition a dozen ofadditional fields for indexing and storing metadata about the document.


It runs on top of Solr 3.1 with the following configuration:
- 2 master indexes
- 2 slaves indexes
Each server is a quad-core with 32Gb of Ram, and 4 SATA drives in RAID10.

The indexing performance are quite good. We can reindex our full datacollection in less than a day (using only the two master indexes). Liveupdates (a few millions documents per day) are processed continuously byour masters. We replicate the change every hours to the slave indexes.Query performance are also ok (you can try it by yourself on [1]).

As a side note, we are using Solr 3.1 plus a plugin we have developpedfor indexing semi-structured data. This plugin is adding much more datato the index than plain Solr. So you can expect even better performanceby using plain solr (with respect to indexing performance).


[1] http://sindice.com
--
Renaud Delbru

On 12/05/11 17:59, atreyu wrote:

Hi,

I have about 300 million docs (or 10TB data) which is doubling every 3
years, give or take.  The data mostly consists of Oracle records, webpage
files (HTML/XML, etc.) and office doc files.  There are b/t two and four
dozen concurrent users, typically.  The indexing server has>  27 GB of RAM,
but it still gets extremely taxed, and this will only get worse.

Would Solr be able to efficiently deal with a load of this size?  I am
trying to avoid the heavy cost of GSA, etc...

Thanks.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Support-for-huge-data-set-tp2932652p2932652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Support for huge data set?

Reply via email to