Thanks Otis ! Please ignore my earlier email which does not have all the information.
My business requirements have changed a bit. We now need one year rolling data in Production, with the following details - Number of records -> 1.2 million - Solr index size for these records comes to approximately 200 - 220 GB. (includes large attachments) - Approx 250 users who will be searching the applicaiton with a peak of 1 search request every 40 seconds. I am planning to address this using Solr distributed search on a VMWare virtualized environment as follows. 1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves (load balanced) 2. Master configuration for each server is as follows - 4 CPUs - 16 GB RAM - 300 GB disk space 3. Slave configuration for each server is as follows - 4 CPUs - 16 GB RAM - 150 GB disk space 4. I am planning to use SAN instead of local storage to store Solr index. And my questions are as follows: Will 3 shards serve the purpose here ? Is SAN a a good option for storing solr index, given the high index volume ? On Mon, Nov 21, 2011 at 3:05 PM, Rahul Warawdekar < rahul.warawde...@gmail.com> wrote: > Thanks ! > > My business requirements have changed a bit. > We need one year rolling data in Production. > The index size for the same comes to approximately 200 - 220 GB. > I am planning to address this using Solr distributed search as follows. > > 1. Whole index to be split up between 3 shards, with 3 masters and 6 > slaves (load balanced) > 2. Master configuration > will be 4 CPU > > > > On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic < > otis_gospodne...@yahoo.com> wrote: > >> Hi Rahul, >> >> This is unfortunately not enough information for anyone to give you very >> precise answers, so I'll just give some rough ones: >> >> * best disk - SSD :) >> * CPU - multicore, depends on query complexity, concurrency, etc. >> * sharded search and failover - start with SolrCloud, there are a couple >> of pages about it on the Wiki and >> http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/ >> >> Otis >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >> Lucene ecosystem search :: http://search-lucene.com/ >> >> >> >________________________________ >> >From: Rahul Warawdekar <rahul.warawde...@gmail.com> >> >To: solr-user <solr-user@lucene.apache.org> >> >Sent: Tuesday, October 11, 2011 11:47 AM >> >Subject: Architecture and Capacity planning for large Solr index >> > >> >Hi All, >> > >> >I am working on a Solr search based project, and would highly appreciate >> >help/suggestions from you all regarding Solr architecture and capacity >> >planning. >> >Details of the project are as follows >> > >> >1. There are 2 databases from which, data needs to be indexed and made >> >searchable, >> > - Production >> > - Archive >> >2. Production database will retain 6 months old data and archive data >> every >> >month. >> >3. Archive database will retain 3 years old data. >> >4. Database is SQL Server 2008 and Solr version is 3.1 >> > >> >Data to be indexed contains a huge volume of attachments (PDF, Word, >> excel >> >etc..), approximately 200 GB per month. >> >We are planning to do a full index every month (multithreaded) and >> >incremental indexing on a daily basis. >> >The Solr index size is coming to approximately 25 GB per month. >> > >> >If we were to use distributed search, what would be the best >> configuration >> >for Production as well as Archive indexes ? >> >What would be the best CPU/RAM/Disk configuration ? >> >How can I implement failover mechanism for sharded searches ? >> > >> >Please let me know in case I need to share more information. >> > >> > >> >-- >> >Thanks and Regards >> >Rahul A. Warawdekar >> > >> > >> > >> > > > > -- > Thanks and Regards > Rahul A. Warawdekar > > -- Thanks and Regards Rahul A. Warawdekar