Hello, We currently index our data through a SQL-DIH setup but due to our model (and therefore sql query) becoming complex we need to index our data programmatically. As we didn't have to deal with commit/optimise before, we are now wondering whether there is an optimal approach to that. Is there a batch size after which we should fire a commit or should we execute a commit after indexing all of our data? What about optimise?
Our document corpus is > 4m documents and through DIH the resulting index is around 1.5G We have searched previous posts but couldn't find a definite answer. Any input much appreciated! Regards, -- Savvas