What version of Solr are you using?

About committing. I'd just let the solr defaults handle that. You configure
this in the autocommit section of solrconfig.xml. I'm pretty sure this  gets
triggered even if you're using SolrJ.

That said, it's probably wise to issue a commit after all your data is
indexed
too, just to flush any remaining documents since the last autocommit.

Optimize should not be issued until you're all done, if at all. If
you're not deleting (or updating) documents, don't bother to optimize
unless the number of files in your index directory gets really large.
Recent Solr code almost removes the need to optimize unless you
delete documents, but I confess I don't know the revision number
"recent" refers to, perhaps only trunk...

HTH
Erick

On Thu, Oct 28, 2010 at 9:56 AM, Savvas-Andreas Moysidis <
savvas.andreas.moysi...@googlemail.com> wrote:

> Hello,
>
> We currently index our data through a SQL-DIH setup but due to our model
> (and therefore sql query) becoming complex we need to index our data
> programmatically. As we didn't have to deal with commit/optimise before, we
> are now wondering whether there is an optimal approach to that. Is there a
> batch size after which we should fire a commit or should we execute a
> commit
> after indexing all of our data? What about optimise?
>
> Our document corpus is > 4m documents and through DIH the resulting index
> is
> around 1.5G
>
> We have searched previous posts but couldn't find a definite answer. Any
> input much appreciated!
>
> Regards,
> -- Savvas
>

Reply via email to