First off, optimize is actually rarely necessary. I wouldn't bother
unless you have measurements to prove that it's desirable.

I would _certainly_ not call optimize every 10M docs. If you must call
it at all call it exactly once when indexing is complete. But see
above.

As far as the commit, I'd just set the autocommit settings in
solrconfig.xml to something "reasonable" and forget it. I usually use
time rather than doc count as it's a little more predictable. I often
use 60 seconds, but it can be longer. The longer it is, the bigger
your tlog will grow and if Solr shuts down forcefully the longer
replaying may take. Here's the whole writeup on this topic:

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Running out of space during indexing with about 30% utilization is
very odd. My guess is that you're trying to take too much control.
Having multiple optimizations going on at once would be a very good
way to run out of disk space.

And I'm assuming one replica's index per disk or you're reporting
aggregate index size per disk when you sah 30%. Having three replicas
on the same disk each consuming 30% is A Bad Thing.

Best,
Erick

On Mon, Dec 12, 2016 at 8:36 AM, Michael Joyner <mich...@newsrx.com> wrote:
> Halp!
>
> I need to reindex over 43 millions documents, when optimized the collection
> is currently < 30% of disk space, we tried it over this weekend and it ran
> out of space during the reindexing.
>
> I'm thinking for the best solution for what we are trying to do is to call
> commit/optimize every 10,000,000 documents or so and then wait for the
> optimize to complete.
>
> How to check optimized status via solrj for a particular collection?
>
> Also, is there is a way to check free space per shard by collection?
>
> -Mike
>

Reply via email to