Hi Jan,

Thanks for the info. Jake is now off the project, but I've been doing some more 
work on it in the meantime.

We have q=8, so yes you're right that's more potential open DBs than the limit 
of 500 - good tip!

I've noticed something I can't explain and was wondering if you could shed some 
light.

With our test CouchDB 2.0 cluster with 720 DB shards (90 * q=8, n = 2 with 2 
nodes), we've noticed the nodes stabilise on the following amount of memory 
usage:

Max DBs | Memory
500             2GB
800             4GB
1000           6GB

This is the equilibrium it reaches while under load from our view-building and 
conflict-resolution scheduled tasks, but with no replication or other usage.

Running a node with only 4GB ram and 1000 Max DBs, the CouchDB process memory 
usage grew until the kernel killed it, then it built up again under load and 
was killed again.

Questions:
1. Can you explain why there is a difference between Max DBs 800 and 1000, even 
though we have < 800 total DB shards? (does it cache more than one copy of a 
shard?)
2. Is there a reliable way to predict how much memory the server is going to 
need?
3. Is the process running out of memory and being killed by the kernel expected 
in that situation, or have I missed a config item somewhere that caps what 
CouchDB will use?

Thanks in advance!

Darrell Rodgers

-----Original Message-----
From: Jan Lehnardt <[email protected]> 
Sent: Thursday, 14 March 2019 6:16 PM
To: [email protected]
Subject: Re: Getting an error "all_dbs_active" running a CouchDB 2.3 cluster

Heya Jake,

what is your q value for your databases on the source and target clusters?

The default is 8, so 8*90 gives us 720 potential open dbs.

It could just be that under normal operation, your 2.0 cluster never has to 
open all shards at the same time, but now that you are running a migration, the 
target cluster has to.

IIRC there were no semantic changes in this handling between 2.0 and 2.3.

Best
Jan
-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/


> On 14. Mar 2019, at 11:05, Jake Kroon <[email protected]> 
> wrote:
> 
> Hi,
>  
> I’m in the process of trying to migrate a CouchDB cluster from 2.0 to 2.3, by 
> creating a new cluster and replicating the databases over to it, and 
> eventually plan to switch over to the new one. Generally this process is 
> going fine, but I’m getting errors similar to the following when running my 
> applications against the new cluster:
>  
> [error] 2019-02-21T07:04:51.213276Z couchdb@ip <0.17397.4590> f346ddb688 
> rexi_server: from: couchdb@ip (<0.32026.4592>) mfa: fabric_rpc:map_view/5 
> error:{badmatch,{error,all_dbs_active}} 
> [{fabric_rpc,map_view,5,[{file,"src/fabric_rpc.erl"},{line,148}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
>  
> I’m using the default max_dbs_open value of 500 (which is preset in the 
> default.ini file). As far as I understand it, this should be plenty, and it’s 
> what I’m successfully using on my current 2.0 cluster with no errors. I may 
> be misunderstanding how this setting works though.
>  
> I have about 90 databases in the cluster, and all I’m currently running is a 
> couple of scripts:
>  
>       • A “build views” script that runs every hour, that goes through each 
> database and queries each of the views (in series).
>       • A “conflict resolver” script that runs every 15 minutes, that queries 
> all databases for conflicts and then performs custom logic to deal with 
> conflicts (though there won’t be any conflicts on our new server at this 
> time, so it’s just querying the conflicts view on each database)
>  
> I also previously had continuous bidirectional replication set up between the 
> new cluster and the old one, and the “all_dbs_active” error was happening 
> quite often (a couple of times per hour). I’ve cancelled all the replication 
> jobs and the error has reduced to about 1 or 2 instances per day.
>  
> I haven’t yet tried increasing the max_dbs_open value (which seems to be a 
> common suggestion for dealing with the “all_dbs_active” error), because the 
> live 2.0 cluster is working fine with the default value of 500, and has 
> higher load on it than the new 2.3 cluster.
>  
> I was wondering if anyone has any suggestions on what I should look at to try 
> to solve this issue?
>  
> I’m running the cluster on Ubuntu 18.04 LTS.
>  
> Thanks!
> Jake Kroon
> Software Engineer
>  
> D: +61 8 9318 6949
> E: [email protected]
>  
> 
>  
> YouTube  |  LinkedIn  |  Google+

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/


Reply via email to