On 7/12/2013 9:23 AM, Tom Burton-West wrote:
> Do you have any feeling for what gets traded off if we increase the
> maxMergeCount?
> 
> This is completely new for us because we are experimenting with indexing
> pages instead of whole documents.  Since our average document is about 370
> pages, this means that we have increased the number of documents we are
> asking Solr to index by a couple of orders of magnitude. (on the other hand
> the size of the document decreases by a couple of orders of magnitude).
> I'm not sure why increasing the number of documents (and reducing their
> size) is causing more merges.  I'll have to investigate.

I'm not sure that you lose anything, really.  If everything is
proceeding normally before the "stalling" message is logged, I would not
expect it to cause ANY problems.

The reason that I increased this value was because when I did a
full-import of millions of documents from mysql, I would reach the point
where there were three different levels of merges going on at once.
Because the default thread count is one, only the largest merge was
actually occurring, the others were queued and waiting.

With three merges stacked up at once, I had passed the maxMergeCount
threshold, so *indexing* stopped.  It can take several minutes for a
very large merge to finish, so indexing stopped long enough that the
MySQL server would drop the connection established by the JDBC driver.
Once the merge finished and DIH tried to resume indexing, the connection
was gone and it would fail the entire import.

I have never seen more than three merge levels happening at once, so a
value of 6 is probably overkill, but shouldn't be a problem.  The true
goal is to make sure that indexing never stops, not to push the system
limits.  The maxThreadCount parameter should prevent I/O from becoming a
problem.

Thanks,
Shawn

Reply via email to