couchdb-1.6.1. Database has about 30 million documents.

I am trying to import a database into ElasticSearch. I use two methods: CouchDB River with ES-1.x, or Logstash-2.2 with the couchdb_changes input plugin with either ES-1.x or 2.x

No matter the method I use, the speed is great at first - a few thousand documents per second. But after about 1 or 2 million documents have been imported, the process slows down a lot, maybe 10 documents / second if I'm lucky.

ElasticSearch is not overloaded. CPU usage is very low, the instance is not swapping, I gave enough RAM to ES.

River and Logstash are different apps, and yet both use _changes. Both behave the same - very fast at first, then massive slowdown.

I think I've noticed the same behavior even when replicating from one CouchDB instance to another. However, here the speed is much greater, probably because of higher parallelism of CDB replication. But it also tends to go very fast initially, then slow down after a few million documents.

What is the reason for this? Why is _changes fast at first, but it slows down as you make progress and get closer to the end of the DB?

--
Florin Andrei
http://florin.myip.org/

Reply via email to