I presume you are running Solr on a multi-core/CPU server. If you kept a single process hitting Solr to re-index, you'd be using just one of those cores. It would take as long as it takes, I can't see how you would 'overload' it that way.
I guess you could have a strategy that pulls 100 documents with an old last_indexed, and push them for re-indexing. If you get the full 100 docs, you make a subsequent request immediately. If you get less than 100 back, you know you're up-to-date and can wait, say, 30s before making another request. Upayavira On Wed, May 29, 2013, at 12:00 PM, Dotan Cohen wrote: > I see that I do need to reindex my Solr index. The index consists of > 20 million documents with a few hundred new documents added per minute > (social media data). The documents are mostly smaller than 1KiB of > data, but some may go as large as 10 KiB. All the data is text, and > all indexed fields are stored. > > To reindex, I am considering adding a 'last_indexed' field, and having > a Python or Java application pull out N results every T seconds when > sorting on "last_indexed asc". How might I determine a good values for > N and T? I would like to know when the Solr index is 'overloaded', or > whatever happens to Solr when it is being pushed beyond the limits of > its hardware. What should I be looking at to know if Solr is over > stressed? Is looking at CPU and memory good enough? Is there a way to > measure I/O to the disk on which the Solr index is stored? Bear in > mind that while the reindex is happening, clients will be performing > searches and a few hundred documents will be written per minute. Note > that the machine running Solr is an EC2 instance running on Amazon Web > Services, and that the 'disk' on which the Solr index is stored in an > EBS volume. > > Thank you. > > -- > Dotan Cohen > > http://gibberish.co.il > http://what-is-what.com