So writing some SolrJ doing the same job as the DIH script and using that concurrent will solve my problem? I'm not using Tika.
I don't think that DIH is my problem, even if it is not the best solution right now. Nevertheless, you are right SolrJ has higher performance, but what if I have the same problems with SolrJ like with DIH? If it runs with DIH it should run with SolrJ with additional performance boost. Bernd On 27.07.2016 at 16:03, Erick Erickson: > I'd actually recommend you move to a SolrJ solution > or similar. Currently, you're putting a load on the Solr > servers (especially if you're also using Tika) in addition > to all indexing etc. > > Here's a sample: > https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ > > Dodging the question I know, but DIH sometimes isn't > the best solution. > > Best, > Erick > > On Wed, Jul 27, 2016 at 6:59 AM, Bernd Fehling > <bernd.fehl...@uni-bielefeld.de> wrote: >> After enhancing the server with SSDs I'm trying to speed up indexing. >> >> The server has 16 CPUs and more than 100G RAM. >> JAVA (1.8.0_92) has 24G. >> SOLR is 4.10.4. >> Plain XML data to load is 218G with about 96M records. >> This will result in a single index of 299G. >> >> I tried with 4, 8, 12 and 16 concurrent DIHs. >> 16 and 12 was to much because for 16 CPUs and my test continued with 8 >> concurrent DIHs. >> Then i was trying different <indexConfig> and <updateHandler> settings but >> now I'm stuck. >> I can't figure out what is the best setting for bulk indexing. >> What I see is that the indexing is "falling asleep" after some time of >> indexing. >> It is only producing del-files, like _11_1.del, _w_2.del, _h_3.del,... >> >> <indexConfig> >> <maxIndexingThreads>8</maxIndexingThreads> >> <ramBufferSizeMB>1024</ramBufferSizeMB> >> <maxBufferedDocs>-1</maxBufferedDocs> >> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> >> <int name="maxMergeAtOnce">8</int> >> <int name="segmentsPerTier">100</int> >> <int name="maxMergedSegmentMB">512</int> >> </mergePolicy> >> <mergeFactor>8</mergeFactor> >> <mergeScheduler >> class="org.apache.lucene.index.ConcurrentMergeScheduler"/> >> <lockType>${solr.lock.type:native}</lockType> >> ... >> </indexConfig> >> >> <updateHandler class="solr.DirectUpdateHandler2"> >> ### no autocommit at all >> <autoSoftCommit> >> <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> >> </autoSoftCommit> >> </updateHandler> >> >> >> command=full-import&optimize=false&clean=false&commit=false&waitSearcher=false >> After indexing finishes there is a final optimize. >> >> My idea is, if 8 DIHs use 8 CPUs then I have 8 CPUs left for merging >> (maxIndexingThreads/maxMergeAtOnce/mergeFactor). >> It should do no commit, no optimize. >> ramBufferSizeMB is high because I have plenty of RAM and I want make use the >> speed of RAM. >> segmentsPerTier is high to reduce merging. >> >> But somewhere is a misconfiguration because indexing gets stalled. >> >> Any idea what's going wrong? >> >> >> Bernd >>