On 5/15/2012 3:42 PM, Jon Drukman wrote:
I fixed it for now by upping the wait_timeout on the mysql server.
  Apparently Solr doesn't like having its connection yanked out from under
it and/or isn't smart enough to reconnect if the server goes away.  I'll
set it back the way it was and try your readOnly option.

I use DIH with MySQL. The only time I ran into timeouts while importing was related to segment merging. A first level merge happens when the number of segments reaches mergeFactor. A second level merge happens when the number of merged segments reaches mergeFactor. A third level merge happens when you get enough segments created by second level merges. It's probably possible for this to extend to fourth level and beyond, though I have not seen that personally.

When there are multiple merges happening at the same time (on 3.4 and earlier, 3.5 may have changed this), only one of them actually runs, the others are paused. Eventually, if you have a slow I/O system (SATA RAID1 or slower) and a big enough index, your full-import can reach a state where you have all three levels happening at the same time. When this happens, indexing stops. If it stops for long enough, the server will close the connection and DIH will fail once it begins indexing again.

Since my DIH config consists of a single SELECT statement that runs for the entire three hour duration of the import, adding reconnect capability to DIH would not help. The only way to make it work right is to configure things such that Solr never stops indexing. I did this by increasing my mergeFactor, and when I installed Solr 3.5, used maxMergeAtOnce, segmentsPerTier, and maxMergeAtOnceExplicit. I also increased maxMergeCount under mergeScheduler. Here's my current indexDefaults section:

<indexDefaults>
<useCompoundFile>false</useCompoundFile>
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">35</int>
<int name="segmentsPerTier">35</int>
<int name="maxMergeAtOnceExplicit">105</int>
</mergePolicy>
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
<int name="maxMergeCount">4</int>
</mergeScheduler>
<ramBufferSizeMB>128</ramBufferSizeMB>
<maxFieldLength>32768</maxFieldLength>
<writeLockTimeout>1000</writeLockTimeout>
<commitLockTimeout>10000</commitLockTimeout>
<lockType>native</lockType>
</indexDefaults>

Thanks,
Shawn

Reply via email to