Dear All, I moved some CouchDB 3.1.0 databases to a new 4-node cluster via copying the shard files.
The operation worked for 5 out of 6 databases; the biggest database (about 200GB, 12 shards, 2 replicas) did not come online on the new cluster. I suspect high disk latency, but... could someone shed some light on this? The relevant logs are: [info] 2022-07-06T04:30:44.697901Z couchdb@10.0.0.80 \u003c0.228.0\u003e -------- db shards/95555553-aaaaaaa7/twitter.1657067184 died with reason {timeout,{gen_server,call,[\u003c0.26790.5\u003e,find_header]}} [error] 2022-07-06T04:30:44.698269Z couchdb@10.0.0.80 \u003c0.26789.5\u003e -------- CRASH REPORT Process (\u003c0.26789.5\u003e) with 2 neighbors exited with reason: {timeout,{gen_server,call,[\u003c0.26790.5\u003e,find_header]}} at gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378) \u003c= couch_bt_engine:init/2(line:157) \u003c= couch_db_engine:init/3(line:775) \u003c= couch_db_updater:init/1(line:43) \u003c= proc_lib:init_p_do_apply/3(line:247); initial_call: {couch_db_updater,init,['Argument__1']}, ancestors: [\u003c0.26784.5\u003e], message_queue_len: 0, messages: [], links: [\u003c0.26784.5\u003e,\u003c0.26790.5\u003e], dictionary: [{io_priority,{db_update,\u003c\u003c\"shards/95555553-aaaaaaa7/twitter.16570671...\"\u003e\u003e}},...], trap_exit: false, status: running, heap_size: 610, stack_size: 27, reductions: 250 [error] 2022-07-06T04:56:10.077664Z couchdb@10.0.0.80 \u003c0.6591.6\u003e -------- CRASH REPORT Process (\u003c0.6591.6\u003e) with 2 neighbors exited with reason: {timeout,{gen_server,call,[\u003c0.6593.6\u003e,find_header]}} at gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378) \u003c= couch_bt_engine:init/2(line:157) \u003c= couch_db_engine:init/3(line:775) \u003c= couch_db_updater:init/1(line:43) \u003c= proc_lib:init_p_do_apply/3(line:247); initial_call: {couch_db_updater,init,['Argument__1']}, ancestors: [\u003c0.6584.6\u003e], message_queue_len: 0, messages: [], links: [\u003c0.6584.6\u003e,\u003c0.6593.6\u003e], dictionary: [{io_priority,{db_update,\u003c\u003c\"shards/95555553-aaaaaaa7/twitter.16570671...\"\u003e\u003e}},...], trap_exit: false, status: running, heap_size: 610, stack_size: 27, reductions: 250 [info] 2022-07-06T04:56:10.077711Z couchdb@10.0.0.80 \u003c0.228.0\u003e -------- db shards/95555553-aaaaaaa7/twitter.1657067184 died with reason {timeout,{gen_server,call,[\u003c0.6593.6\u003e,find_header]}} [info] 2022-07-07T06:44:13.863950Z couchdb@10.0.0.80 \u003c0.228.0\u003e -------- db shards/95555553-aaaaaaa7/twitter.1657067184 died with reason {timeout,{gen_server,call,[\u003c0.9139.29\u003e,find_header]}} [error] 2022-07-07T06:44:13.864516Z couchdb@10.0.0.80 \u003c0.9152.29\u003e -------- CRASH REPORT Process (\u003c0.9152.29\u003e) with 2 neighbors exited with reason: {timeout,{gen_server,call,[\u003c0.9139.29\u003e,find_header]}} at gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378) \u003c= couch_bt_engine:init/2(line:157) \u003c= couch_db_engine:init/3(line:775) \u003c= couch_db_updater:init/1(line:43) \u003c= proc_lib:init_p_do_apply/3(line:247); initial_call: {couch_db_updater,init,['Argument__1']}, ancestors: [\u003c0.9136.29\u003e], message_queue_len: 0, messages: [], links: [\u003c0.9136.29\u003e,\u003c0.9139.29\u003e], dictionary: [{io_priority,{db_update,\u003c\u003c\"shards/95555553-aaaaaaa7/twitter.16570671...\"\u003e\u003e}},...], trap_exit: false, status: running, heap_size: 610, stack_size: 27, reductions: 250 Cheers, Luca Morandini