Public bug reported: Upstream bugreport: http://tracker.ceph.com/issues/16525
affected version: 10.2.0-0ubuntu0.16.04.2 Copy form upstream: I've got crashed all ceph mons simultaneous when I've tried to move host with working OSD from one root to the other. My command was: ceph osd crush move pp1 root=fast2500 ``` Trace: 9> 2016-06-29 14:10:30.337919 7f6e3951d700 5 - op tracker -- seq: 11, time: 2016-06-29 14:10:30.337919, event: callback finished, op: mon_command({"prefix": "osd crush move", "args": ["root=fast2500"], "name": "pp1"} v 0) 8> 2016-06-29 14:10:30.337924 7f6e3951d700 5 - op tracker -- seq: 11, time: 2016-06-29 14:10:30.337924, event: psvc:dispatch, op: mon_command({"prefix": "osd crush move", "args": ["root=fast2500"], "name": "pp1"} v 0) 7> 2016-06-29 14:10:30.337927 7f6e3951d700 5 mon.pp5@0(leader).paxos(paxos active c 2446247..2446981) is_readable = 1 - now=2016-06-29 14:10:30.337928 lease_expire=2016-06-29 14:10:35.337162 has v0 lc 2446981 -6> 2016-06-29 14:10:30.337950 7f6e3951d700 5 - op tracker -- seq: 11, time: 2016-06-29 14:10:30.337950, event: osdmap:preprocess_query, op: mon_command({"prefix": "osd crush move", "args": ["root=fast2500"], "name": "pp1"} v 0) 5> 2016-06-29 14:10:30.337956 7f6e3951d700 5 - op tracker -- seq: 11, time: 2016-06-29 14:10:30.337956, event: osdmap:preprocess_command, op: mon_command({"prefix": "osd crush move", "args": ["root=fast2500"], "name": "pp1"} v 0) 4> 2016-06-29 14:10:30.338007 7f6e3951d700 5 - op tracker -- seq: 11, time: 2016-06-29 14:10:30.338007, event: osdmap:prepare_update, op: mon_command({"prefix": "osd crush move", "args": ["root=fast2500"], "name": "pp1"} v 0) 3> 2016-06-29 14:10:30.338016 7f6e3951d700 5 - op tracker -- seq: 11, time: 2016-06-29 14:10:30.338015, event: osdmap:prepare_command, op: mon_command({"prefix": "osd crush move", "args": ["root=fast2500"], "name": "pp1"} v 0) 2> 2016-06-29 14:10:30.338039 7f6e3951d700 5 - op tracker -- seq: 11, time: 2016-06-29 14:10:30.338036, event: osdmap:prepare_command_impl, op: mon_command({"prefix": "osd crush move", "args": ["root=fast2500"], "name": "pp1"} v 0) -1> 2016-06-29 14:10:30.338052 7f6e3951d700 0 mon.pp5@0(leader).osd e10230 moving crush item name 'pp1' to location {root=fast2500} 0> 2016-06-29 14:10:30.341861 7f6e3951d700 -1 crush/CrushWrapper.h: In function 'int CrushWrapper::detach_bucket(CephContext*, int)' thread 7f6e3951d700 time 2016-06-29 14:10:30.338135 crush/CrushWrapper.h: 940: FAILED assert(successful_detach) ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55cda5fb9fa0] 2: (()+0x560833) [0x55cda5ec8833] 3: (CrushWrapper::move_bucket(CephContext*, int, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)+0xda) [0x55cda5ec644a] 4: (OSDMonitor::prepare_command_impl(std::shared_ptr<MonOpRequest>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<c har>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > >&)+0x2cfe) [0x55cda5c8701e] 5: (OSDMonitor::prepare_command(std::shared_ptr<MonOpRequest>)+0x2ff) [0x55cda5c9903f] 6: (OSDMonitor::prepare_update(std::shared_ptr<MonOpRequest>)+0x24b) [0x55cda5c9958b] 7: (PaxosService::dispatch(std::shared_ptr<MonOpRequest>)+0xb4f) [0x55cda5c4c0af] 8: (PaxosService::C_RetryMessage::_finish(int)+0x58) [0x55cda5c4d698] 9: (C_MonOp::finish(int)+0x82) [0x55cda5c15862] 10: (Context::complete(int)+0x9) [0x55cda5c14949] 11: (void finish_contexts<Context>(CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x1fb) [0x55cda5c1b25b] 12: (Paxos::finish_round()+0x287) [0x55cda5c41b17] 13: (Paxos::handle_last(std::shared_ptr<MonOpRequest>)+0xe19) [0x55cda5c42cf9] 14: (Paxos::dispatch(std::shared_ptr<MonOpRequest>)+0x250) [0x55cda5c43520] 15: (Monitor::dispatch_op(std::shared_ptr<MonOpRequest>)+0xa38) [0x55cda5c0ee68] 16: (Monitor::_ms_dispatch(Message*)+0x554) [0x55cda5c0f664] 17: (Monitor::ms_dispatch(Message*)+0x23) [0x55cda5c326f3] 18: (DispatchQueue::entry()+0xf2b) [0x55cda60aedfb] 19: (DispatchQueue::DispatchThread::entry()+0xd) [0x55cda5fa032d] 20: (()+0x76fa) [0x7f6e4165a6fa] 21: (clone()+0x6d) [0x7f6e3f916b5d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse 2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mon.pp5.log -- end dump of recent events --- 2016-06-29 14:10:30.346791 7f6e3951d700 -1 ** Caught signal (Aborted) * in thread 7f6e3951d700 thread_name:ms_dispatch ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9) 1: (()+0x5233be) [0x55cda5e8b3be] 2: (()+0x113d0) [0x7f6e416643d0] 3: (gsignal()+0x38) [0x7f6e3f845418] 4: (abort()+0x16a) [0x7f6e3f84701a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x26b) [0x55cda5fba18b] 6: (()+0x560833) [0x55cda5ec8833] 7: (CrushWrapper::move_bucket(CephContext*, int, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)+0xda) [0x55cda5ec644a] 8: (OSDMonitor::prepare_command_impl(std::shared_ptr<MonOpRequest>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<c har>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > >&)+0x2cfe) [0x55cda5c8701e] 9: (OSDMonitor::prepare_command(std::shared_ptr<MonOpRequest>)+0x2ff) [0x55cda5c9903f] 10: (OSDMonitor::prepare_update(std::shared_ptr<MonOpRequest>)+0x24b) [0x55cda5c9958b] 11: (PaxosService::dispatch(std::shared_ptr<MonOpRequest>)+0xb4f) [0x55cda5c4c0af] 12: (PaxosService::C_RetryMessage::_finish(int)+0x58) [0x55cda5c4d698] 13: (C_MonOp::finish(int)+0x82) [0x55cda5c15862] 14: (Context::complete(int)+0x9) [0x55cda5c14949] 15: (void finish_contexts<Context>(CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x1fb) [0x55cda5c1b25b] 16: (Paxos::finish_round()+0x287) [0x55cda5c41b17] 17: (Paxos::handle_last(std::shared_ptr<MonOpRequest>)+0xe19) [0x55cda5c42cf9] 18: (Paxos::dispatch(std::shared_ptr<MonOpRequest>)+0x250) [0x55cda5c43520] 19: (Monitor::dispatch_op(std::shared_ptr<MonOpRequest>)+0xa38) [0x55cda5c0ee68] 20: (Monitor::_ms_dispatch(Message*)+0x554) [0x55cda5c0f664] 21: (Monitor::ms_dispatch(Message*)+0x23) [0x55cda5c326f3] 22: (DispatchQueue::entry()+0xf2b) [0x55cda60aedfb] 23: (DispatchQueue::DispatchThread::entry()+0xd) [0x55cda5fa032d] 24: (()+0x76fa) [0x7f6e4165a6fa] 25: (clone()+0x6d) [0x7f6e3f916b5d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2016-06-29 14:10:30.346791 7f6e3951d700 -1 ** Caught signal (Aborted) * in thread 7f6e3951d700 thread_name:ms_dispatch ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9) 1: (()+0x5233be) [0x55cda5e8b3be] 2: (()+0x113d0) [0x7f6e416643d0] 3: (gsignal()+0x38) [0x7f6e3f845418] 4: (abort()+0x16a) [0x7f6e3f84701a] 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x26b) [0x55cda5fba18b] 6: (()+0x560833) [0x55cda5ec8833] 7: (CrushWrapper::move_bucket(CephContext*, int, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)+0xda) [0x55cda5ec644a] 8: (OSDMonitor::prepare_command_impl(std::shared_ptr<MonOpRequest>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<c har>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > >&)+0x2cfe) [0x55cda5c8701e] 9: (OSDMonitor::prepare_command(std::shared_ptr<MonOpRequest>)+0x2ff) [0x55cda5c9903f] 10: (OSDMonitor::prepare_update(std::shared_ptr<MonOpRequest>)+0x24b) [0x55cda5c9958b] 11: (PaxosService::dispatch(std::shared_ptr<MonOpRequest>)+0xb4f) [0x55cda5c4c0af] 12: (PaxosService::C_RetryMessage::_finish(int)+0x58) [0x55cda5c4d698] 13: (C_MonOp::finish(int)+0x82) [0x55cda5c15862] 14: (Context::complete(int)+0x9) [0x55cda5c14949] 15: (void finish_contexts<Context>(CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x1fb) [0x55cda5c1b25b] 16: (Paxos::finish_round()+0x287) [0x55cda5c41b17] 17: (Paxos::handle_last(std::shared_ptr<MonOpRequest>)+0xe19) [0x55cda5c42cf9] 18: (Paxos::dispatch(std::shared_ptr<MonOpRequest>)+0x250) [0x55cda5c43520] 19: (Monitor::dispatch_op(std::shared_ptr<MonOpRequest>)+0xa38) [0x55cda5c0ee68] 20: (Monitor::_ms_dispatch(Message*)+0x554) [0x55cda5c0f664] 21: (Monitor::ms_dispatch(Message*)+0x23) [0x55cda5c326f3] 22: (DispatchQueue::entry()+0xf2b) [0x55cda60aedfb] 23: (DispatchQueue::DispatchThread::entry()+0xd) [0x55cda5fa032d] 24: (()+0x76fa) [0x7f6e4165a6fa] 25: (clone()+0x6d) [0x7f6e3f916b5d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 1/ 5 kinetic 1/ 5 fuse -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mon.pp5.log ``` osd tree at the moment of crash: ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -17 1.00000 root fast2500 -11 1.00000 host pp7 0 1.00000 osd.0 up 1.00000 1.00000 -5 9.12993 root ssd -4 4.79999 host pp11 11 1.20000 osd.11 up 1.00000 1.00000 3 1.20000 osd.3 up 1.00000 1.00000 2 1.20000 osd.2 up 1.00000 1.00000 1 1.20000 osd.1 up 1.00000 1.00000 -8 0 host pp2 -7 0 host pp3 -12 0.25000 host pp4 8 0.25000 osd.8 up 1.00000 1.00000 -13 0.48000 host pp1 4 0.48000 osd.4 up 1.00000 1.00000 -2 0.09999 host c2 9 0.09999 osd.9 up 0.79999 1.00000 -1 0.09999 host c1 6 0.09999 osd.6 up 1.00000 1.00000 -3 0.09999 host c3 10 0.09999 osd.10 up 1.00000 1.00000 -6 0.70000 host c4 12 0.70000 osd.12 up 0.79999 1.00000 -9 0.45000 host c5 13 0.45000 osd.13 up 1.00000 1.00000 -10 0.45000 host c6 14 0.45000 osd.14 up 1.00000 1.00000 -14 0.45000 host c7 15 0.45000 osd.15 up 1.00000 1.00000 -15 0.79999 host c8 16 0.79999 osd.16 up 1.00000 1.00000 -16 0.45000 host c9 17 0.45000 osd.17 up 1.00000 1.00000 ** Affects: ceph (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1597411 Title: host move inside CRUSH cause permanent crash for all mons To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1597411/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs