First off, optimize is actually rarely necessary. I wouldn't bother unless you have measurements to prove that it's desirable.
I would _certainly_ not call optimize every 10M docs. If you must call it at all call it exactly once when indexing is complete. But see above. As far as the commit, I'd just set the autocommit settings in solrconfig.xml to something "reasonable" and forget it. I usually use time rather than doc count as it's a little more predictable. I often use 60 seconds, but it can be longer. The longer it is, the bigger your tlog will grow and if Solr shuts down forcefully the longer replaying may take. Here's the whole writeup on this topic: https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Running out of space during indexing with about 30% utilization is very odd. My guess is that you're trying to take too much control. Having multiple optimizations going on at once would be a very good way to run out of disk space. And I'm assuming one replica's index per disk or you're reporting aggregate index size per disk when you sah 30%. Having three replicas on the same disk each consuming 30% is A Bad Thing. Best, Erick On Mon, Dec 12, 2016 at 8:36 AM, Michael Joyner <mich...@newsrx.com> wrote: > Halp! > > I need to reindex over 43 millions documents, when optimized the collection > is currently < 30% of disk space, we tried it over this weekend and it ran > out of space during the reindexing. > > I'm thinking for the best solution for what we are trying to do is to call > commit/optimize every 10,000,000 documents or so and then wait for the > optimize to complete. > > How to check optimized status via solrj for a particular collection? > > Also, is there is a way to check free space per shard by collection? > > -Mike >