hi guys, I'm crawling a file system folder and indexing 10 million docs, and I am adding them in batches of 5000, committing every 50 000 docs. The problem I am facing is that after each commit, the documents per sec that are indexed gets less and less.
If I do not commit at all, I can index those docs very quickly, and then I commit once at the end, but once i start indexing docs _after_ that (for example new files get added to the folder), indexing is also slowing down a lot. Is it normal that the SOLR indexing speed depends on the number of documents that are _already_ indexed? I think it shouldn't matter if i start from scratch or I index a document in a core that already has a couple of million docs. Looks like SOLR is either doing something in a linear fashion, or there is some magic config parameter that I am not aware of. I've read all perf docs, and I've tried changing mergeFactor, autowarmCounts, and the buffer sizes - to no avail. I am using SOLR 5.1 Thanks ! Angel