I'd like to have some suggestion on how to improve the indexing performance on the following scenario I'm uploading 1M docs to solr,
every docs has id: sequential number title: small string date: date body: 1kb of text Here are my benchmarks (they are all single executions, not averages from multiple executions): 1) using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document total time: 143035ms 1.1) using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document <ramBufferSizeMB>500</ramBufferSizeMB> <maxBufferedDocs>100000</maxBufferedDocs> total time: 134493ms 1.2) using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document <mergeFactor>30</mergeFactor> total time: 143134ms 2) using a solrj client from another pc in the lan (100Mbps) with httpsolrserver with javabin format add documents to the server in batches of 1k docs ( server.add( <collection> ) ) auto commit every 15s with openSearcher=false and commit after last document total time: 139022ms 3) using a solrj client from another pc in the lan (100Mbps) with concurrentupdatesolrserver with javelin format add documents to the server in batches of 1k docs ( server.add( <collection> ) ) server queue size=20k server threads=4 no auto-commit and commit every 100k docs total time: 167301ms --On the solr server-- cpu averages 25% at best 100% for 1 core IO is still far from being saturated iostat gives a pattern like this (every 5 s) time(s) %util 100 45,20 105 1,68 110 17,44 115 76,32 120 2,64 125 68 130 1,28 I thought that using concurrentupdatesolrserver I was able to max cpu or IO but I wasn't. With concurrentupdatesolrserver I can't rely on auto commit, otherwise I get an OutOfMemory error and I found that committing every 100k docs gives worse performance than auto commit every 15s (benchmark 3 with httpsolrserver took 193515) I'd really like to understand why I can't max out the resources on the server hosting solr (disk above all) And I'd really like to understand what I'm doing wrong with concurrentupdatesolrserver thanks