We are using CouchDB 1.6.1/CentOS Linux release 7.0.1406. CouchDB was installed 
using `yum`.

We tried to run data conversion on some 100 databases. Most databases have less 
than 1500 documents (around 1MB) except for 3 which have around 200,000 
documents (around 250 MB). Conversion ran fine on few databases then we started 
seeing `Error: connect ECONNREFUSED 127.0.0.1:5984` errors. 

Conversion steps:

Replicate `database_1` to `database_1_backup`.
Delete `database_1`.
Recreate `database_1`.
Read documents from `database_1_backup` in memory.
Write to `database_1` using bulkDocs.

Crash log:

[Wed, 13 Apr 2016 21:05:06 GMT] [info] [<0.2715.524>] starting new replication 
`27dd24d1bd28e13225559e3e0a6c275a` at <0.5681.524> (`database_1` -> 
`database_1_backup`)
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.5681.524>] recording a checkpoint 
for `database_1` -> `database_1_backup` at source update_seq 2209
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.2715.524>] <ip.address> - - POST 
/_replicate 200
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.31752.523>] <ip.address> - - GET 
/database_1_backup/ 200
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.10914.524>] <ip.address> - - GET 
/database_1/ 200
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.2623.524>] <ip.address> - - GET 
/database_1/ 200
[Wed, 13 Apr 2016 21:05:07 GMT] [info] [<0.7567.524>] <ip.address> - - DELETE 
/database_1/ 200
[Wed, 13 Apr 2016 21:05:07 GMT] [error] [<0.137.0>] ** Generic server 
couch_index_server terminating
** Last message in was {'$gen_cast',{reset_indexes,<<"database_1">>}}
** When Server state == {st,"/var/lib/couchdb"}
** Reason for termination ==
** {{badmatch,{error,eacces}},
    [{couch_file,nuke_dir,2,[{file,"couch_file.erl"},{line,237}]},
     {couch_file,'-nuke_dir/2-fun-0-',3,[{file,"couch_file.erl"},{line,228}]},
     {lists,foreach,2,[{file,"lists.erl"},{line,1323}]},
     {couch_file,nuke_dir,2,[{file,"couch_file.erl"},{line,236}]},
     {couch_index_server,hafndle_cast,2,
                         [{file,"src/couch_index_server.erl"},{line,117}]},
     {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,604}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}

Our seconds attempt to re-run the conversion completely crashed couchDB.

[Wed, 13 Apr 2016 22:17:19 GMT] [info] [<0.19197.0>] starting new replication 
`6fe446668153db8635e9f49ddd8895f2` at <0.20012.0> (`database_2` -> `database_2`)
[Wed, 13 Apr 2016 22:17:19 GMT] [info] [<0.20012.0>] recording a checkpoint for 
`database_2` -> `database_2` at source update_seq 1631
[Wed, 13 Apr 2016 22:17:19 GMT] [error] [<0.20012.0>] Replication 
`6fe446668153db8635e9f49ddd8895f2` (`database_2` -> `database_2`) failed: 
{checkpoint_commit_failure,<<"Error updating the target checkpoint document: 
conflict">>}
[Wed, 13 Apr 2016 22:17:19 GMT] [error] [<0.20012.0>] ** Generic server 
<0.20012.0> terminating
** Last message in was {'EXIT',<0.20027.0>,normal}
** When Server state == {rep_state,
                         {rep,
                          {"6fe446668153db8635e9f49ddd8895f2",[]},
                          <<"database_2">>,<<"database_2">>,
                          [{checkpoint_interval,5000},
                           {connection_timeout,30000},
                           {http_connections,20},
                           {retries,10},
                           {socket_options,[{keepalive,true},{nodelay,false}]},
                           {use_checkpoints,true},
                           {worker_batch_size,500},
                           {worker_processes,4}],


erl_crash.dump - https://paste.ee/r/EWRYV <https://paste.ee/r/EWRYV>

SeLinux is not an issue here, at least not this time. 

Any help would be greatly appreciated debugging this crash log.

Thanks,

Sajin Shrestha

Reply via email to