A quick add on to this -- we have over 30 million documents. I take it that we should be looking @ Distributed Solr? as in http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e344
Thanks. On Mon, Feb 27, 2012 at 2:33 PM, Memory Makers <memmakers...@gmail.com>wrote: > Many thanks for the response. > > Here is the revised questions: > > For example if I have N processes that are producing documents to index: > 1. Should I have them simultaneously submit documents to Solr (will this > improve the indexing throughput)? > 2. Is there anything I can do Solr configuration wise that will allow me > to speed up indexing > 3. Is there an architecture where I can have two (or more) solr server do > indexing in parallel > > Thanks. > > On Mon, Feb 27, 2012 at 1:46 PM, Erik Hatcher <erik.hatc...@gmail.com>wrote: > >> Yes, absolutely. Parallelizing indexing can make a huge difference. How >> you do so will depend on your indexing environment. Most crudely, running >> multiple indexing scripts on different subsets of data up to the the >> limitations of your operating system and hardware is how many do it. >> SolrJ has some multithreaded facility, as does DataImportHandler. >> Distributing the indexing to multiple machines, but pointing all to the >> same Solr server, is effectively the same as multi-threading it.... push >> documents into Solr from wherever as fast as it can handle it. This is >> definitely how many do this. >> >> Erik >> >> On Feb 27, 2012, at 13:24 , Memory Makers wrote: >> >> > Hi, >> > >> > Is there a way to speed up indexing by increasing the number of threads >> > doing the indexing or perhaps by distributing indexing on multiple >> machines? >> > >> > Thanks. >> >> >