A quick add on to this -- we have over 30 million documents.

I take it that we should be looking @ Distributed Solr?
  as in
http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e344

Thanks.

On Mon, Feb 27, 2012 at 2:33 PM, Memory Makers <memmakers...@gmail.com>wrote:

> Many thanks for the response.
>
> Here is the revised questions:
>
> For example if I have N processes that are producing documents to index:
> 1. Should I have them simultaneously submit documents to Solr (will this
> improve the indexing throughput)?
> 2. Is there anything I can do Solr configuration wise that will allow me
> to speed up indexing
> 3. Is there an architecture where I can have two (or more) solr server do
> indexing in parallel
>
> Thanks.
>
> On Mon, Feb 27, 2012 at 1:46 PM, Erik Hatcher <erik.hatc...@gmail.com>wrote:
>
>> Yes, absolutely.  Parallelizing indexing can make a huge difference.  How
>> you do so will depend on your indexing environment.  Most crudely, running
>> multiple indexing scripts on different subsets of data up to the the
>> limitations of your operating system and hardware is how many do it.
>> SolrJ has some multithreaded facility, as does DataImportHandler.
>>  Distributing the indexing to multiple machines, but pointing all to the
>> same Solr server, is effectively the same as multi-threading it.... push
>> documents into Solr from wherever as fast as it can handle it.  This is
>> definitely how many do this.
>>
>>        Erik
>>
>> On Feb 27, 2012, at 13:24 , Memory Makers wrote:
>>
>> > Hi,
>> >
>> > Is there a way to speed up indexing by increasing the number of threads
>> > doing the indexing or perhaps by distributing indexing on multiple
>> machines?
>> >
>> > Thanks.
>>
>>
>

Reply via email to