Hi Carlos,

Ah, this is an interesting edge case. The "security object” contains the 
“admins” and “members” metadata for a database. For historical reasons it is 
*not* versioned like a normal document. Under normal operating circumstances 
every replica of every shard contains a copy of the security object for the 
database.

When you add a replica for an existing shard that replica does not yet have the 
security object. There is an internal process running in the cluster that 
regularly ensures that the security objects for a database are in sync. That 
process has a safeguard that will cause it to bail out and do nothing unless it 
recovers a simple majority of the security objects for all shard replicas of 
the database in question. Your statement that “having less than half of the 
replica nodes available for the database … raises this error” is almost 
correct; technically, what causes this error is when the cluster is unable to 
contact a majority of the *shard replicas*, regardless of which nodes are 
hosting them.

Hopefully this is an unusual scenario. That said, we could think about 
improving the cluster’s behavior here by allowing the security synchronization 
process to “punch through” maintenance mode and retrieve the security objects 
from those shards for the purposes of establishing a majority and subsequently 
converging all the shards. I think that’s worth further discussion in a GitHub 
issue at least.

Cheers, Adam

> On Jun 14, 2017, at 12:57 PM, Carlos Alonso <carlos.alo...@cabify.com> wrote:
> 
> Ok, so I've made some progress on this and I'd like to share it here.
> 
> So the error says "*Error getting security objects for
> <<"affected_database_here">> : {error,no_majority}*" and that is actually
> not related to configuring a new replica node as I was saying before but to
> nodes in maintenance mode when read/write operations happen.
> 
> In summary, having less than half of the replica nodes available for the
> database you're working on raises this error. The database is available
> though (maximum availability by design I guess :))
> 
> My question then is, what does this error exactly mean? What are the so
> called security objects? Is it something one has to carefully consider
> avoiding?
> 
> Thank you.
> 
> On Tue, Jun 13, 2017 at 7:34 PM Carlos Alonso <carlos.alo...@cabify.com 
> <mailto:carlos.alo...@cabify.com>>
> wrote:
> 
>> Hi guys!
>> 
>> I continue trying to understand how CouchDB clusters work and trying to
>> build a compelling administration tool that covers basic operations such as
>> adding a node to the cluster, moving a shard from one node to another and
>> so on. It is WIP but already open sourced here:
>> https://github.com/cabify/couchdb-admin
>> 
>> Testing the scale out procedure (add node, make it replicate some shards,
>> remove the shard from the previous location) I've seen the following error
>> :
>> 
>> [error] 2017-06-13T15:58:22.299140Z couchdb@couch-2.couchdb2-replica-admin
>> <0.2214.3> -------- Error getting security objects for <<"testdb3">>:
>> {error,no_majority}
>> 
>> 
>> Not only mentioning my testdb3 but also with internal ones such as
>> _global_changes. I mean, I was scaling out testdb3, but errors appeared
>> referring to testdb3 and also _global_changes, but I wasn't scaling out
>> _global_changes.
>> 
>> 
>> The error appears when I configure a new node as being replica for an
>> existing shard (by adding it to the by_nodes and by_ranges sections of
>> document at _dbs/testdb3)
>> 
>> 
>> The error appears every few seconds on the new replica logs once for each
>> of the other replicas (3 for testdb3 and 2 for _global_changes at that
>> time) and it also appears on the other nodes' logs but just once every few
>> seconds.
>> 
>> 
>> The error stops appearing once I remove the maintenance_mode flag on the
>> new replica (because before configuring it as replica I enable that flag so
>> the node doesn't participate in reads. Kudos Adam Kocoloski for your advice
>> here) once pending_changes messages stop appearing on the new replica.
>> 
>> I think the error is making the catch_up process not to work properly as
>> my consistency checks fail when this error appears during the procedure
>> (doesn't happen 100% of the times).
>> 
>> I've seen it both happening when the new replica node was completely empty
>> but also when it had the data preloaded (via rsync or because it had
>> previously been a replica).
>> 
>> 
>> I hope so many text helps you out :)
>> 
>> Thanks!
>> 
>> 
>> --
>> [image: Cabify - Your private Driver] <http://www.cabify.com/>
>> 
>> *Carlos Alonso*
>> Data Engineer
>> Madrid, Spain
>> 
>> carlos.alo...@cabify.com
>> 
>> Prueba gratis con este código
>> #CARLOSA6319 <https://cabify.com/i/carlosa6319>
>> [image: Facebook] <http://cbify.com/fb_ES>[image: Twitter]
>> <http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES>[image:
>> Linkedin] <https://www.linkedin.com/in/mrcalonso>
>> 
> -- 
> [image: Cabify - Your private Driver] <http://www.cabify.com/ 
> <http://www.cabify.com/>>
> 
> *Carlos Alonso*
> Data Engineer
> Madrid, Spain
> 
> carlos.alo...@cabify.com <mailto:carlos.alo...@cabify.com>
> 
> Prueba gratis con este código
> #CARLOSA6319 <https://cabify.com/i/carlosa6319 
> <https://cabify.com/i/carlosa6319>>
> [image: Facebook] <http://cbify.com/fb_ES <http://cbify.com/fb_ES>>[image: 
> Twitter]
> <http://cbify.com/tw_ES <http://cbify.com/tw_ES>>[image: Instagram] 
> <http://cbify.com/in_ES <http://cbify.com/in_ES>>[image:
> Linkedin] <https://www.linkedin.com/in/mrcalonso 
> <https://www.linkedin.com/in/mrcalonso>>
> 
> -- 
> Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su 
> destinatario, pudiendo contener información confidencial sometida a secreto 
> profesional. No está permitida su reproducción o distribución sin la 
> autorización expresa de Cabify. Si usted no es el destinatario final por 
> favor elimínelo e infórmenos por esta vía. 
> 
> This message and any attached file are intended exclusively for the 
> addressee, and it may be confidential. You are not allowed to copy or 
> disclose it without Cabify's prior written authorization. If you are not 
> the intended recipient please delete it from your system and notify us by 
> e-mail.

Reply via email to