What version of Solr are you using? About committing. I'd just let the solr defaults handle that. You configure this in the autocommit section of solrconfig.xml. I'm pretty sure this gets triggered even if you're using SolrJ.
That said, it's probably wise to issue a commit after all your data is indexed too, just to flush any remaining documents since the last autocommit. Optimize should not be issued until you're all done, if at all. If you're not deleting (or updating) documents, don't bother to optimize unless the number of files in your index directory gets really large. Recent Solr code almost removes the need to optimize unless you delete documents, but I confess I don't know the revision number "recent" refers to, perhaps only trunk... HTH Erick On Thu, Oct 28, 2010 at 9:56 AM, Savvas-Andreas Moysidis < savvas.andreas.moysi...@googlemail.com> wrote: > Hello, > > We currently index our data through a SQL-DIH setup but due to our model > (and therefore sql query) becoming complex we need to index our data > programmatically. As we didn't have to deal with commit/optimise before, we > are now wondering whether there is an optimal approach to that. Is there a > batch size after which we should fire a commit or should we execute a > commit > after indexing all of our data? What about optimise? > > Our document corpus is > 4m documents and through DIH the resulting index > is > around 1.5G > > We have searched previous posts but couldn't find a definite answer. Any > input much appreciated! > > Regards, > -- Savvas >