Hello Joao, Thanks for your help. I increased logging on the failed monitor and noticed a lot of cephx authentication errors. After verifying ntp sync, I noticed that the monitor keyring deployed on working monitors differed from what was stored in the management server’s ceph.mon.keyring. Syncing the key and redeploying monitors got them to peer and establish quorum.
> On Dec 14, 2015, at 11:10 , deeepdish <deeepd...@gmail.com> wrote: > > Joao, > > Please see below. I think you’re totally right on: > >> I suspect they may already have this monitor in their map, but either >> with a different name or a different address -- and are thus ignoring >> probes from a peer that does not match what they are expecting. > > > The monitor in question has been previously working (quorum). It was > removed and now attempting to re-add using a different IP address as per > public procedure: > http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ > <http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/> (I > followed the 'CHANGING A MONITOR’S IP ADDRESS (THE RIGHT WAY)’ procedure) > > # ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.smg01.asok > mon_status > { > "name": "smg01", > "rank": 0, > "state": "probing", > "election_epoch": 0, > "quorum": [], > "outside_quorum": [ > "smg01" > ], > "extra_probe_peers": [ > "10.20.1.8:6789\/0", > "10.20.10.251:6789\/0", > "10.20.10.252:6789\/0" > ], > "sync_provider": [], > "monmap": { > "epoch": 0, > "fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7", > "modified": "0.000000", > "created": "0.000000", > "mons": [ > { > "rank": 0, > "name": "smg01", > "addr": "10.20.10.250:6789\/0" > }, > { > "rank": 1, > "name": "smon01s", > "addr": "0.0.0.0:0\/1" > }, > { > "rank": 2, > "name": "smon02s", > "addr": "0.0.0.0:0\/2" > }, > { > "rank": 3, > "name": "b02s08", > "addr": "0.0.0.0:0\/3" > } > ] > } > } > > > # ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.smon01.asok > mon_status > { > "name": "smon01", > "rank": 1, > "state": "peon", > "election_epoch": 2702, > "quorum": [ > 0, > 1, > 2 > ], > "outside_quorum": [], > "extra_probe_peers": [], > "sync_provider": [], > "monmap": { > "epoch": 12, > "fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7", > "modified": "2015-12-09 06:23:43.665100", > "created": "0.000000", > "mons": [ > { > "rank": 0, > "name": "b02s08", > "addr": "10.20.1.8:6789\/0" > }, > { > "rank": 1, > "name": "smon01", > "addr": "10.20.10.251:6789\/0" > }, > { > "rank": 2, > "name": "smon02", > "addr": "10.20.10.252:6789\/0" > } > ] > } > } > > # ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.smon02.asok > mon_status > { > "name": "smon02", > "rank": 2, > "state": "peon", > "election_epoch": 2702, > "quorum": [ > 0, > 1, > 2 > ], > "outside_quorum": [], > "extra_probe_peers": [], > "sync_provider": [], > "monmap": { > "epoch": 12, > "fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7", > "modified": "2015-12-09 06:23:43.665100", > "created": "0.000000", > "mons": [ > { > "rank": 0, > "name": "b02s08", > "addr": "10.20.1.8:6789\/0" > }, > { > "rank": 1, > "name": "smon01", > "addr": "10.20.10.251:6789\/0" > }, > { > "rank": 2, > "name": "smon02", > "addr": "10.20.10.252:6789\/0" > } > ] > } > } > > > # ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.b02s08.asok > mon_status > { > "name": "b02s08", > "rank": 0, > "state": "leader", > "election_epoch": 2702, > "quorum": [ > 0, > 1, > 2 > ], > "outside_quorum": [], > "extra_probe_peers": [], > "sync_provider": [], > "monmap": { > "epoch": 12, > "fsid": "693834c1-1f95-4237-ab97-a767b0c0e6e7", > "modified": "2015-12-09 06:23:43.665100", > "created": "0.000000", > "mons": [ > { > "rank": 0, > "name": "b02s08", > "addr": "10.20.1.8:6789\/0" > }, > { > "rank": 1, > "name": "smon01", > "addr": "10.20.10.251:6789\/0" > }, > { > "rank": 2, > "name": "smon02", > "addr": "10.20.10.252:6789\/0" > } > ] > } > } > > > >> On Dec 14, 2015, at 04:56 , Joao Eduardo Luis <j...@suse.de >> <mailto:j...@suse.de>> wrote: >> >> On 12/14/2015 12:41 AM, deeepdish wrote: >>> Perhaps I’m not understanding something.. >>> >>> The “extra_probe_peers” ARE the other working monitors in quorum out of >>> the mon_host line in ceph.conf. >>> >>> In the example below 10.20.1.8 = b20s08; 10.20.10.251 = smon01s; >>> 10.20.10.252 = smon02s >>> >>> The monitor is not reaching out to the other IPs and syncing. I’m able >>> to ping all IPs in the extra_probe_peers list. >> >> Okay, so that means the other monitors are, for some reason, ignoring >> the probes from this monitor. >> >> Can you please show the result of mon_status from the monitors in the >> quorum? >> >> I suspect they may already have this monitor in their map, but either >> with a different name or a different address -- and are thus ignoring >> probes from a peer that does not match what they are expecting. >> >> -Joao >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com