After enhancing the server with SSDs I'm trying to speed up indexing. The server has 16 CPUs and more than 100G RAM. JAVA (1.8.0_92) has 24G. SOLR is 4.10.4. Plain XML data to load is 218G with about 96M records. This will result in a single index of 299G.
I tried with 4, 8, 12 and 16 concurrent DIHs. 16 and 12 was to much because for 16 CPUs and my test continued with 8 concurrent DIHs. Then i was trying different <indexConfig> and <updateHandler> settings but now I'm stuck. I can't figure out what is the best setting for bulk indexing. What I see is that the indexing is "falling asleep" after some time of indexing. It is only producing del-files, like _11_1.del, _w_2.del, _h_3.del,... <indexConfig> <maxIndexingThreads>8</maxIndexingThreads> <ramBufferSizeMB>1024</ramBufferSizeMB> <maxBufferedDocs>-1</maxBufferedDocs> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> <int name="maxMergeAtOnce">8</int> <int name="segmentsPerTier">100</int> <int name="maxMergedSegmentMB">512</int> </mergePolicy> <mergeFactor>8</mergeFactor> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/> <lockType>${solr.lock.type:native}</lockType> ... </indexConfig> <updateHandler class="solr.DirectUpdateHandler2"> ### no autocommit at all <autoSoftCommit> <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> </autoSoftCommit> </updateHandler> command=full-import&optimize=false&clean=false&commit=false&waitSearcher=false After indexing finishes there is a final optimize. My idea is, if 8 DIHs use 8 CPUs then I have 8 CPUs left for merging (maxIndexingThreads/maxMergeAtOnce/mergeFactor). It should do no commit, no optimize. ramBufferSizeMB is high because I have plenty of RAM and I want make use the speed of RAM. segmentsPerTier is high to reduce merging. But somewhere is a misconfiguration because indexing gets stalled. Any idea what's going wrong? Bernd