I presume you are running Solr on a multi-core/CPU server. If you kept a
single process hitting Solr to re-index, you'd be using just one of
those cores. It would take as long as it takes, I can't see how you
would 'overload' it that way. 

I guess you could have a strategy that pulls 100 documents with an old
last_indexed, and push them for re-indexing. If you get the full 100
docs, you make a subsequent request immediately. If you get less than
100 back, you know you're up-to-date and can wait, say, 30s before
making another request.

Upayavira

On Wed, May 29, 2013, at 12:00 PM, Dotan Cohen wrote:
> I see that I do need to reindex my Solr index. The index consists of
> 20 million documents with a few hundred new documents added per minute
> (social media data). The documents are mostly smaller than 1KiB of
> data, but some may go as large as 10 KiB. All the data is text, and
> all indexed fields are stored.
> 
> To reindex, I am considering adding a 'last_indexed' field, and having
> a Python or Java application pull out N results every T seconds when
> sorting on "last_indexed asc". How might I determine a good values for
> N and T? I would like to know when the Solr index is 'overloaded', or
> whatever happens to Solr when it is being pushed beyond the limits of
> its hardware. What should I be looking at to know if Solr is over
> stressed? Is looking at CPU and memory good enough? Is there a way to
> measure I/O to the disk on which the Solr index is stored? Bear in
> mind that while the reindex is happening, clients will be performing
> searches and a few hundred documents will be written per minute. Note
> that the machine running Solr is an EC2 instance running on Amazon Web
> Services, and that the 'disk' on which the Solr index is stored in an
> EBS volume.
> 
> Thank you.
> 
> --
> Dotan Cohen
> 
> http://gibberish.co.il
> http://what-is-what.com

Reply via email to