hi guys,

I'm crawling a file system folder and indexing 10 million docs, and I am
adding them in batches of 5000, committing every 50 000 docs. The problem I
am facing is that after each commit, the documents per sec that are indexed
gets less and less.

If I do not commit at all, I can index those docs very quickly, and then I
commit once at the end, but once i start indexing docs _after_ that (for
example new files get added to the folder), indexing is also slowing down a
lot.

Is it normal that the SOLR indexing speed depends on the number of
documents that are _already_ indexed? I think it shouldn't matter if i
start from scratch or I index a document in a core that already has a
couple of million docs. Looks like SOLR is either doing something in a
linear fashion, or there is some magic config parameter that I am not aware
of.

I've read all perf docs, and I've tried changing mergeFactor,
autowarmCounts, and the buffer sizes - to no avail.

I am using SOLR 5.1

Thanks !
Angel

Reply via email to