Hi,

There's a bug in 3.1.0 that affects you. Namely that the default 5 second 
gen_server timeout is used for some requests if ioq bypass is enabled. Please 
check if your config has a [ioq.bypass] section and try again without bypasses 
for a time.

If you could explain your migration process in more detail perhaps we can find 
other explanations. I note that such migrations are better done online using 
replication, moving the files around is a bit more challenging.

B.

> On 7 Jul 2022, at 08:28, Luca Morandini <luca.morandi...@gmail.com> wrote:
> 
> Dear All,
> 
> I moved some CouchDB 3.1.0 databases to a new 4-node cluster via
> copying the shard files.
> 
> The operation worked for 5 out of 6 databases; the biggest database
> (about 200GB, 12 shards, 2 replicas) did not come online on the new
> cluster.
> 
> I suspect high disk latency, but... could someone shed some light on this?
> 
> The relevant logs are:
> 
> [info] 2022-07-06T04:30:44.697901Z couchdb@10.0.0.80
> \u003c0.228.0\u003e -------- db
> shards/95555553-aaaaaaa7/twitter.1657067184 died with reason
> {timeout,{gen_server,call,[\u003c0.26790.5\u003e,find_header]}}
> [error] 2022-07-06T04:30:44.698269Z couchdb@10.0.0.80
> \u003c0.26789.5\u003e -------- CRASH REPORT Process
> (\u003c0.26789.5\u003e) with 2 neighbors exited with reason:
> {timeout,{gen_server,call,[\u003c0.26790.5\u003e,find_header]}} at
> gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378)
> \u003c= couch_bt_engine:init/2(line:157) \u003c=
> couch_db_engine:init/3(line:775) \u003c=
> couch_db_updater:init/1(line:43) \u003c=
> proc_lib:init_p_do_apply/3(line:247); initial_call:
> {couch_db_updater,init,['Argument__1']}, ancestors:
> [\u003c0.26784.5\u003e], message_queue_len: 0, messages: [], links:
> [\u003c0.26784.5\u003e,\u003c0.26790.5\u003e], dictionary:
> [{io_priority,{db_update,\u003c\u003c\"shards/95555553-aaaaaaa7/twitter.16570671...\"\u003e\u003e}},...],
> trap_exit: false, status: running, heap_size: 610, stack_size: 27,
> reductions: 250
> [error] 2022-07-06T04:56:10.077664Z couchdb@10.0.0.80
> \u003c0.6591.6\u003e -------- CRASH REPORT Process
> (\u003c0.6591.6\u003e) with 2 neighbors exited with reason:
> {timeout,{gen_server,call,[\u003c0.6593.6\u003e,find_header]}} at
> gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378)
> \u003c= couch_bt_engine:init/2(line:157) \u003c=
> couch_db_engine:init/3(line:775) \u003c=
> couch_db_updater:init/1(line:43) \u003c=
> proc_lib:init_p_do_apply/3(line:247); initial_call:
> {couch_db_updater,init,['Argument__1']}, ancestors:
> [\u003c0.6584.6\u003e], message_queue_len: 0, messages: [], links:
> [\u003c0.6584.6\u003e,\u003c0.6593.6\u003e], dictionary:
> [{io_priority,{db_update,\u003c\u003c\"shards/95555553-aaaaaaa7/twitter.16570671...\"\u003e\u003e}},...],
> trap_exit: false, status: running, heap_size: 610, stack_size: 27,
> reductions: 250
> [info] 2022-07-06T04:56:10.077711Z couchdb@10.0.0.80
> \u003c0.228.0\u003e -------- db
> shards/95555553-aaaaaaa7/twitter.1657067184 died with reason
> {timeout,{gen_server,call,[\u003c0.6593.6\u003e,find_header]}}
> [info] 2022-07-07T06:44:13.863950Z couchdb@10.0.0.80
> \u003c0.228.0\u003e -------- db
> shards/95555553-aaaaaaa7/twitter.1657067184 died with reason
> {timeout,{gen_server,call,[\u003c0.9139.29\u003e,find_header]}}
> [error] 2022-07-07T06:44:13.864516Z couchdb@10.0.0.80
> \u003c0.9152.29\u003e -------- CRASH REPORT Process
> (\u003c0.9152.29\u003e) with 2 neighbors exited with reason:
> {timeout,{gen_server,call,[\u003c0.9139.29\u003e,find_header]}} at
> gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378)
> \u003c= couch_bt_engine:init/2(line:157) \u003c=
> couch_db_engine:init/3(line:775) \u003c=
> couch_db_updater:init/1(line:43) \u003c=
> proc_lib:init_p_do_apply/3(line:247); initial_call:
> {couch_db_updater,init,['Argument__1']}, ancestors:
> [\u003c0.9136.29\u003e], message_queue_len: 0, messages: [], links:
> [\u003c0.9136.29\u003e,\u003c0.9139.29\u003e], dictionary:
> [{io_priority,{db_update,\u003c\u003c\"shards/95555553-aaaaaaa7/twitter.16570671...\"\u003e\u003e}},...],
> trap_exit: false, status: running, heap_size: 610, stack_size: 27,
> reductions: 250
> 
> Cheers,
> 
> Luca Morandini

Reply via email to