On 6/6/2013 4:13 AM, Sebastian Steinfeld wrote:
The amout of documents I want to index is 8 million, the first 1,6 million are
indexed in 2min, but to complete the Import it takes nearly 2 hours.
The size of the index on the hard drive is 610MB.
I started the solr server with 2GB memory.
I read that the duration of indexing might be connected to the batch size, so I
increased the batchSize in the dataSource to 10.000, but this didn't make any
differences.
I also tried to disable the autocommit, which is configured in the
solrconfig.xml. I disabled it by uncommenting it, but this also didn't made any
differences.
If you are importing from MySQL, you actually want the batchSize to be
-1. This streams the results so they don't take up large blocks of
memory. Other JDBC drivers have different ways of configuring this mode
of operation. You fully redacted the driver and URL in your config
file, so I don't know what you are using.
2GB of Java heap for Solr is probably not enough. It's likely that once
your index gets big enough, Solr is starved for memory and has to
perform constant garbage collections to free up enough for basic
operation. I would bet that you also don't have enough free memory for
the OS to cache the index well:
http://wiki.apache.org/solr/SolrPerformanceProblems
If you are using 4.x with the updateLog turned on, then you want
autoCommit enabled with openSearcher to be false. This is covered on
the wiki page I linked.
Thanks,
Shawn