Dear All,

I moved some CouchDB 3.1.0 databases to a new 4-node cluster via
copying the shard files.

The operation worked for 5 out of 6 databases; the biggest database
(about 200GB, 12 shards, 2 replicas) did not come online on the new
cluster.

I suspect high disk latency, but... could someone shed some light on this?

The relevant logs are:

[info] 2022-07-06T04:30:44.697901Z couchdb@10.0.0.80
\u003c0.228.0\u003e -------- db
shards/95555553-aaaaaaa7/twitter.1657067184 died with reason
{timeout,{gen_server,call,[\u003c0.26790.5\u003e,find_header]}}
[error] 2022-07-06T04:30:44.698269Z couchdb@10.0.0.80
\u003c0.26789.5\u003e -------- CRASH REPORT Process
(\u003c0.26789.5\u003e) with 2 neighbors exited with reason:
{timeout,{gen_server,call,[\u003c0.26790.5\u003e,find_header]}} at
gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378)
\u003c= couch_bt_engine:init/2(line:157) \u003c=
couch_db_engine:init/3(line:775) \u003c=
couch_db_updater:init/1(line:43) \u003c=
proc_lib:init_p_do_apply/3(line:247); initial_call:
{couch_db_updater,init,['Argument__1']}, ancestors:
[\u003c0.26784.5\u003e], message_queue_len: 0, messages: [], links:
[\u003c0.26784.5\u003e,\u003c0.26790.5\u003e], dictionary:
[{io_priority,{db_update,\u003c\u003c\"shards/95555553-aaaaaaa7/twitter.16570671...\"\u003e\u003e}},...],
trap_exit: false, status: running, heap_size: 610, stack_size: 27,
reductions: 250
[error] 2022-07-06T04:56:10.077664Z couchdb@10.0.0.80
\u003c0.6591.6\u003e -------- CRASH REPORT Process
(\u003c0.6591.6\u003e) with 2 neighbors exited with reason:
{timeout,{gen_server,call,[\u003c0.6593.6\u003e,find_header]}} at
gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378)
\u003c= couch_bt_engine:init/2(line:157) \u003c=
couch_db_engine:init/3(line:775) \u003c=
couch_db_updater:init/1(line:43) \u003c=
proc_lib:init_p_do_apply/3(line:247); initial_call:
{couch_db_updater,init,['Argument__1']}, ancestors:
[\u003c0.6584.6\u003e], message_queue_len: 0, messages: [], links:
[\u003c0.6584.6\u003e,\u003c0.6593.6\u003e], dictionary:
[{io_priority,{db_update,\u003c\u003c\"shards/95555553-aaaaaaa7/twitter.16570671...\"\u003e\u003e}},...],
trap_exit: false, status: running, heap_size: 610, stack_size: 27,
reductions: 250
[info] 2022-07-06T04:56:10.077711Z couchdb@10.0.0.80
\u003c0.228.0\u003e -------- db
shards/95555553-aaaaaaa7/twitter.1657067184 died with reason
{timeout,{gen_server,call,[\u003c0.6593.6\u003e,find_header]}}
[info] 2022-07-07T06:44:13.863950Z couchdb@10.0.0.80
\u003c0.228.0\u003e -------- db
shards/95555553-aaaaaaa7/twitter.1657067184 died with reason
{timeout,{gen_server,call,[\u003c0.9139.29\u003e,find_header]}}
[error] 2022-07-07T06:44:13.864516Z couchdb@10.0.0.80
\u003c0.9152.29\u003e -------- CRASH REPORT Process
(\u003c0.9152.29\u003e) with 2 neighbors exited with reason:
{timeout,{gen_server,call,[\u003c0.9139.29\u003e,find_header]}} at
gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378)
\u003c= couch_bt_engine:init/2(line:157) \u003c=
couch_db_engine:init/3(line:775) \u003c=
couch_db_updater:init/1(line:43) \u003c=
proc_lib:init_p_do_apply/3(line:247); initial_call:
{couch_db_updater,init,['Argument__1']}, ancestors:
[\u003c0.9136.29\u003e], message_queue_len: 0, messages: [], links:
[\u003c0.9136.29\u003e,\u003c0.9139.29\u003e], dictionary:
[{io_priority,{db_update,\u003c\u003c\"shards/95555553-aaaaaaa7/twitter.16570671...\"\u003e\u003e}},...],
trap_exit: false, status: running, heap_size: 610, stack_size: 27,
reductions: 250

Cheers,

Luca Morandini

Reply via email to