Hello, I'm using CouchDB for our company's billing platform.
We have 4 dedicated servers (32-64 GB of ram, 3-8 TB of disks with ssh cache) in the same datacenter. All servers serve same set of databases (about 40 databases per machine) with all-to-all replications via _replicator database. Databases are different - from several documents to several hundreds million documents. 2 databases are 500GB+. Documents are simple, without complex structure and almost none attaches. We have application to maintain all of this replications and thats what for: We are expecting usual unpredictable failures of replications. For example, document in _replicator database can have status = "triggered", but there are none tasks with such data at that moment at server. Or even document without "source" field for a few minutes every day at every server. Replications crached every hours due unclear errors like "source database is out of sync, please encrease max_dbs_open". max_dbs_open is 800 at every server and databases are less than 50. So even if 50 database multiply to 3 replications is less than limit. Creating documents in _replicator database is hard too. Example: # first, deleting old one [Fri, 20 Sep 2013 15:41:21 GMT] [info] [<0.24052.0>] 83.240.73.210 - - DELETE /_replicator/example.com_db?rev=10-89450b554d11bf9a6d7e15a136ae663f 200 # deleted [Fri, 20 Sep 2013 15:41:24 GMT] [info] [<0.22050.0>] 83.240.73.210 - - GET /_replicator/example.com_db?revs_info=true 404 # creating new one with same id [Fri, 20 Sep 2013 15:41:45 GMT] [info] [<0.25844.0>] 176.9.143.85 - - HEAD /_replicator/example.com_db 404 # seams created [Fri, 20 Sep 2013 15:41:45 GMT] [info] [<0.25845.0>] 176.9.143.85 - - PUT /_replicator/example.com_db 201 # where is it?.. [Fri, 20 Sep 2013 15:41:51 GMT] [info] [<0.22050.0>] 83.240.73.210 - - GET /_replicator/example.com_db?revs_info=true 404 # next try, creating [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.27720.0>] 176.9.143.85 - - HEAD /_replicator/example.com_db 404 # and now it created [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.27730.0>] 176.9.143.85 - - PUT /_replicator/example.com_db 201 # because replication starts successfully [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.111.0>] Attempting to start replication `c0071dc985cbc3df3a225a6d75f0be7b+continuous` (document `example.com_db`). [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.28092.0>] Document `example.com_db` triggered replication `c0071dc985cbc3df3a225a6d75f0be7b+continuous` [Fri, 20 Sep 2013 15:42:05 GMT] [info] [<0.28090.0>] starting new replication `c0071dc985cbc3df3a225a6d75f0be7b+continuous` at <0.28092.0> (`http://replica:*****@example.com:5984/db/` -> `db`) Does someone uses couchdb in the similar manner as we are? Am I only experiencing such problems? P.S. We are using couchdb 1.3.1 and 1.4.0 with Gentoo Linux. -- ---------------- Best regards Alexey Elfman mailto:[email protected]
