Hi Carlos, The SO post is more geared towards a larger rebalancing effort, which is why it copies the files around directly (it's a lot faster).
If you want to rely on the internal replicator, that's fine, but it's noisy. It would be a bit of work to describe how to do that with the minimum of noise and taking care of correctness. At cloudant we obviously do this kind of operation from time to time and have our own tools and procedures. I will mention this again internally, as it really should be easier to do this in CouchDB. To your other question. the shard map stored in the '_dbs' database is considered definitive. all nodes have a copy of that (unsharded) database and all nodes cache those maps into memory as they are consulted very frequently. most actions that occur in the mem3 tier (where knowledge of all shards lives and where the internal replicator lives) will create files that are missing automatically, as you've seen. As a general point, you shouldn't be creating or deleting files in the database directory while CouchDB is running. My procedure copies the file to its new location _before_ telling couchdb about it, and that's important. CouchDB will keep an open file descriptor for performance reasons, so copying data to those files via external processes, or even deleting the files, can have unexpected consequences and data loss is certainly possible. B. > On 27 Jul 2017, at 09:17, Carlos Alonso <[email protected]> wrote: > > Hi Robert, thanks for your reply. Very good SO response indeed. > > Just a couple of questions yet. > > When you say "Perform a replication, again taking care to do this on port > 5986" what do you exactly mean? Could you please paste replication document > example? Isn't a replication started as soon as the shards map is modified? > > Also, related to the issue I've experienced. When I manually delete the > shard from the first location, is there anything internal to CouchDB that > still references it? When I tried to move it back, the logs shown an error > complaining that the file already existed (but in reality it didn't). > Please see those errors pasted below: > > 1. Tries to create the shard, somehow detect that it existed before > (probably something I forgot to delete on step 6) > > `mem3_shards tried to create shards/0fffffff-15555553/my-db.1500994155, got > file_exists` (edited) > > 2. gen_server crashes > > `CRASH REPORT Process (<0.27288.4>) with 0 neighbors exited with reason: > no match of right hand value {error,enoent} at couch_file:sync/1(line:211) > <= couch_db_updater:sync_header/2(line:987) <= > couch_db_updater:update_docs_int/5(line:906) <= > couch_db_updater:handle_info/2(line:289) <= > gen_server:handle_msg/5(line:599) <= proc_lib:wake_up/3(line:247) at > gen_server:terminate/6(line:737) <= proc_lib:wake_up/3(line:247); > initial_call: {couch_db_updater,init,['Argument__1']}, ancestors: > [<0.27267.4>], messages: [], links: [<0.210.0>], dictionary: > [{io_priority,{db_update,<<"shards/0fffffff-15555553/my_db...">>}},...], > trap_exit: false, status: running, heap_size: 6772, stack_size: 27, > reductions: 300961927` > > 3. Seems to somehow recover and try to open the file again > > ``` > Could not open file ./data/shards/0fffffff-15555553/my_db.1500994155.couch: > no such file or directory > > open_result error {not_found,no_db_file} for > shards/0fffffff-15555553/my_db.1500994155 > ``` > > 4. Tries to create the file > > `creating missing database: shards/0fffffff-15555553/my_db.1500994155` > > > Thank you very much. > > On Thu, Jul 27, 2017 at 9:59 AM Robert Samuel Newson <[email protected]> > wrote: > >> Not sure if you saw my write-up from the BigCouch era, still valid for >> CouchDB 2.0; >> >> >> https://stackoverflow.com/questions/6676972/moving-a-shard-from-one-bigcouch-server-to-another-for-balancing >> >> Shard moving / database rebalancing is definitely a bit tricky and we >> could use better tools for it. >> >> >>> On 26 Jul 2017, at 18:10, Carlos Alonso <[email protected]> >> wrote: >>> >>> Hi! >>> >>> I have had a few log errors when moving a shard under particular >>> circumstances and I'd like to share it here and get your input on whether >>> this should be reported or not. >>> >>> So let me describe the steps I took: >>> >>> 1. 3 nodes cluster (couch-0, couch-1 and couch-2), 1 database (my_db) >> with >>> 48 shards and 1 replica >>> 2. A 4th node (couch-3) is added to the cluster. >>> 3. Change shards map so that the last one gets one of the shards from >>> couch-0 (at this moment both couch-0 and couch-4 contain the shard) >>> 4. Synchronisation happens and the new node gets its shard >>> 5. Change shards map again so that couch-0 is not that shard owner >> anymore >>> 6. I go into couch-0 node and manually delete the .couch file of the >> shard, >>> to reclaim disk space >>> 7. All fine here >>> >>> 8. Now I want to put back the shard into the original node, where it was >>> before >>> 9. I put couch-0 into maintenance (I always do this before adding a shard >>> to a node, to avoid it responding to reads before it is synced) >>> 10. Modify the shards map adding the shard to the couch-0 >>> 11. All nodes logs get full of errors (details below) >>> 12. I remove couch-0 maintenance mode and things seem to flow again >>> >>> So this is the process, now please let me describe what I spotted on the >>> logs: >>> >>> Couch-0 seems to go through a few statuses: >>> >>> 1. Tries to create the shard, somehow detect that it existed before >>> (probably something I forgot to delete on step 6) >>> >>> `mem3_shards tried to create shards/0fffffff-15555553/my-db.1500994155, >> got >>> file_exists` (edited) >>> >>> 2. gen_server crashes >>> >>> `CRASH REPORT Process (<0.27288.4>) with 0 neighbors exited with reason: >>> no match of right hand value {error,enoent} at >> couch_file:sync/1(line:211) >>> <= couch_db_updater:sync_header/2(line:987) <= >>> couch_db_updater:update_docs_int/5(line:906) <= >>> couch_db_updater:handle_info/2(line:289) <= >>> gen_server:handle_msg/5(line:599) <= proc_lib:wake_up/3(line:247) at >>> gen_server:terminate/6(line:737) <= proc_lib:wake_up/3(line:247); >>> initial_call: {couch_db_updater,init,['Argument__1']}, ancestors: >>> [<0.27267.4>], messages: [], links: [<0.210.0>], dictionary: >>> [{io_priority,{db_update,<<"shards/0fffffff-15555553/my_db...">>}},...], >>> trap_exit: false, status: running, heap_size: 6772, stack_size: 27, >>> reductions: 300961927` >>> >>> 3. Seems to somehow recover and try to open the file again >>> >>> ``` >>> Could not open file >> ./data/shards/0fffffff-15555553/my_db.1500994155.couch: >>> no such file or directory >>> >>> open_result error {not_found,no_db_file} for >>> shards/0fffffff-15555553/my_db.1500994155 >>> ``` >>> >>> 4. Tries to create the file >>> >>> `creating missing database: shards/0fffffff-15555553/my_db.1500994155` >>> >>> 5. Continuously fails because it cannot load validation funcs, possibly >>> because of the maintenance mode? >>> >>> ``` >>> Error in process <0.2126.141> on node 'couchdb@couch-0' with exit value: >>> {{badmatch,{error,{maintenance_mode,nil,'couchdb@couch-0 >>> >> '}}},[{ddoc_cache_opener,recover_validation_funs,1,[{file,"src/ddoc_cache_opener.erl"},{line,127}]},{ddoc_cache_opener,fetch_doc_data,1,[... >>> >>> Error in process <0.1970.141> on node 'couchdb@couch-0' with exit value: >>> >> {{case_clause,{error,{{badmatch,{error,{maintenance_mode,nil,'couchdb@couch-0 >>> >> '}}},[{ddoc_cache_opener,recover_validation_funs,1,[{file,"src/ddoc_cache_opener.erl"},{line,127}]},{ddoc_cache_opener... >>> >>> could not load validation funs >>> >> {{case_clause,{error,{{badmatch,{error,{maintenance_mode,nil,'couchdb@couch-0 >>> >> '}}},[{ddoc_cache_opener,recover_validation_funs,1,[{file,"src/ddoc_cache_opener.erl"},{line,127}]},{ddoc_cache_opener,fetch_doc_data,1,[{file,"src/ddoc_cache_opener.erl"},{line,240}]}]}}},[{ddoc_cache_opener,handle_open_response,1,[{file,"src/ddoc_cache_opener.erl"},{line,282}]},{couch_db,'-load_validation_funs/1-fun-0-',1,[{file,"src/couch_db.erl"},{line,659}]}]} >>> ``` >>> >>> Couch-3 shows a warning and an error >>> >>> ``` >>> [warning] ... -------- mem3_sync >> shards/0fffffff-15555553/my_db.1500994155 >>> couchdb@couch-0 >>> >> {internal_server_error,[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,267}]},{mem3_rep,save_on_target,3,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,286}]},{mem3_rep,replicate_batch,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,256}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,178}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]} >>> ``` >>> ``` >>> [error] ... -------- Error in process <0.13658.13> on node >> 'couchdb@couch-3' >>> with exit value: >>> >> {internal_server_error,[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,267}]},{mem3_rep,save_on_target,3,[{file,"src/mem3_rep.erl"},{line,286}]},{mem3_rep,replicate_batch,1,[{file,"src/mem3_rep.erl"},{line,256}]},{mem3_rep... >>> ``` >>> >>> Which, to me, means that couch-0 is responding internal server errors to >>> his requests >>> >>> Couch-1, which is, by the way, the owner of the task for replicating >> my_db >>> from a remote server seems to go through two statuses: >>> >>> First seems as being unable to continue with the replication process >>> because receives a 500 error (maybe from couch-0?) >>> ``` >>> [error] ... req_err(4096501418 <(409)%20650-1418>) unknown_error : >> badarg >>> [<<"dict:fetch/2 L130">>,<<"couch_util:-reorder_results/2-lc$^1/1-1-/2 >>> L424">>,<<"couch_util:-reorder_results/2-lc$^1/1-1-/2 >>> L424">>,<<"fabric_doc_update:go/3 L41">>,<<"fabric:update_docs/3 >>> L259">>,<<"chttpd_db:db_req/2 L445">>,<<"chttpd:process_request/1 >>> L293">>,<<"chttpd:handle_request_int/1 L229">>] >>> >>> >>> [notice] ... 127.0.0.1:5984 127.0.0.1 undefined POST /my_db/_bulk_docs >> 500 >>> ok 114 >>> >>> [notice] ... Retrying POST request to >> http://127.0.0.1:5984/my_db/_bulk_docs >>> in 0.25 seconds due to error {code,500} >>> ``` >>> >>> After disabling maintenance on couch-0 the replication process seem to >> work >>> again, but a few seconds later lots of new errors appear again: >>> >>> ``` >>> [error] -------- rexi_server exit:timeout >>> >> [{rexi,init_stream,1,[{file,"src/rexi.erl"},{line,256}]},{rexi,stream2,3,[{file,"src/rexi.erl"},{line,204}]},{fabric_rpc,view_cb,2,[{file,"src/fabric_rpc.erl"},{line,286}]},{couch_mrview,finish_fold,2,[{file,"src/couch_mrview.erl"},{line,632}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}] >>> ``` >>> >>> And a few seconds later (I have not been able to correlate it to anything >>> so far) they stop. >>> >>> Finally, couch-2 just show one error, the same as the last from couch-1 >>> >>> ``` >>> [error] -------- rexi_server exit:timeout >>> >> [{rexi,init_stream,1,[{file,"src/rexi.erl"},{line,256}]},{rexi,stream2,3,[{file,"src/rexi.erl"},{line,204}]},{fabric_rpc,view_cb,2,[{file,"src/fabric_rpc.erl"},{line,286}]},{couch_mrview,finish_fold,2,[{file,"src/couch_mrview.erl"},{line,632}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}] >>> ``` >>> >>> *In conclusion:* >>> >>> To me it looks like two things are involved here: >>> >>> 1. The fact that I deleted the file from disk and something else still >> know >>> that it should be there >>> 2. The fact that the node is under maintenance and it seems that prevents >>> from new shards to be created >>> >>> Sorry for such a wall of text. I hope it is detailed enough to get >>> someone's input on this that can help me confirm or refuse my theories >> and >>> decide whether it makes sense or not to open a GH issue to make the >> process >>> more robust at this stage. >>> >>> Regards >>> >>> >>> -- >>> [image: Cabify - Your private Driver] <http://www.cabify.com/> >>> >>> *Carlos Alonso* >>> Data Engineer >>> Madrid, Spain >>> >>> [email protected] >>> >>> Prueba gratis con este código >>> #CARLOSA6319 <https://cabify.com/i/carlosa6319> >>> [image: Facebook] <http://cbify.com/fb_ES>[image: Twitter] >>> <http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES >>> [image: >>> Linkedin] <https://www.linkedin.com/in/mrcalonso> >>> >>> -- >>> Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su >>> destinatario, pudiendo contener información confidencial sometida a >> secreto >>> profesional. No está permitida su reproducción o distribución sin la >>> autorización expresa de Cabify. Si usted no es el destinatario final por >>> favor elimínelo e infórmenos por esta vía. >>> >>> This message and any attached file are intended exclusively for the >>> addressee, and it may be confidential. You are not allowed to copy or >>> disclose it without Cabify's prior written authorization. If you are not >>> the intended recipient please delete it from your system and notify us by >>> e-mail. >> >> -- > [image: Cabify - Your private Driver] <http://www.cabify.com/> > > *Carlos Alonso* > Data Engineer > Madrid, Spain > > [email protected] > > Prueba gratis con este código > #CARLOSA6319 <https://cabify.com/i/carlosa6319> > [image: Facebook] <http://cbify.com/fb_ES>[image: Twitter] > <http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES>[image: > Linkedin] <https://www.linkedin.com/in/mrcalonso> > > -- > Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su > destinatario, pudiendo contener información confidencial sometida a secreto > profesional. No está permitida su reproducción o distribución sin la > autorización expresa de Cabify. Si usted no es el destinatario final por > favor elimínelo e infórmenos por esta vía. > > This message and any attached file are intended exclusively for the > addressee, and it may be confidential. You are not allowed to copy or > disclose it without Cabify's prior written authorization. If you are not > the intended recipient please delete it from your system and notify us by > e-mail.
