Re: Shards cannot be read after move to a different cluster

Robert Newson Fri, 08 Jul 2022 05:04:59 -0700

Hi,

The config file isn't monitored, so just changing the file won't help, you'd 
need to restart couchdb.

Did you have anything bypassed in the first place, though?

Could you explain the replication problems you encountered?

I can say for sure that it is generally unsafe to modify shard files 
out-of-band while couchdb is running, as it appears you did at step 5. Couchdb 
may well have opened the shard files (created at step 3) and it holds them open 
in a cache.

I don't think we have written up how to do this properly (we strongly advise 
replication instead) but I did write an SO post a while ago: 
https://stackoverflow.com/questions/6676972/moving-a-shard-from-one-bigcouch-server-to-another-for-balancing.
 The sharing scheme for bigcouch is the same as for couchdb 3.x.

The essential difference is to _not_ create the clustered database at the 
target cluster until _after_ you've copied the shard files over. You then 
create the '_dbs' doc yourself. (Note that in big couch this database was 
called "dbs").

B.

> On 8 Jul 2022, at 09:08, Luca Morandini <[email protected]> wrote:
> 
> On Fri, 8 Jul 2022 at 17:17, Robert Newson <[email protected]> wrote:
>> 
>> Hi,
>> 
>> There's a bug in 3.1.0 that affects you. Namely that the default 5 second 
>> gen_server timeout is used for some requests if ioq bypass is enabled. 
>> Please check if your config has a [ioq.bypass] section and try again without 
>> bypasses for a time.
> 
> Thanks for taking the time to answer me.
> 
> I set all the settings of the [ioq.bypass] section to false, set the
> cluster in maintenance mode, waited a couple minutes, than set
> maintenance to false... but no joy.
> 
> 
>> If you could explain your migration process in more detail perhaps we can 
>> find other explanations. I note that such migrations are better done online 
>> using replication, moving the files around is a bit more challenging.
> 
> I tried replication, but it failed, hence the shard files copy.
> 
> The procedure I followed (a tad simplified):
> - set the source cluster in maintenance mode;
> - copied the shard files to a shared disk;
> - created a database with the same name on the target cluster;
> - changed the database id on the copied shard files to match the
> newly-created one on the target cluster;
> - set the target cluster to maintenance mode;
> - copied the shard files from the shared disk to the target cluster
> data directories, making sure to get the shard directories right;
> - unset the maintenance mode on the target cluster.
> 
> The procedure above worked for a few databases (including one that
> -with replicas- was 6GB) but failed with the 200GB database.
> 
> Cheers,
> 
> Luca Morandini

Re: Shards cannot be read after move to a different cluster

Reply via email to