Public bug reported: After an unfortunate incident with dhcpd going away, we lost 3/6 of our ceph cluster and had to remotely power cycle them to get them back. Now that everything is back up, the ceph cluster has mostly recovered but we had a couple of pg's stuck in an inconsistent state, so I ran 'ceph osd repair' on one of the osds involved in the inconsistent pgs. It ran for a while and fixed some things, and then exploded with this:
2013-09-18 18:52:24.116439 7fdf4e2d9700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::recover_got(hobject_t, eversion_t)' thread 7fdf4e2d9700 time 2013-09-18 18:52:24.035055 osd/ReplicatedPG.cc: 5351: FAILED assert(missing.num_missing() == 0) ceph version 0.48.3argonaut (commit:920f82e805efec2cae05b79c155c07df0f3ed5dd) 1: (ReplicatedPG::recover_got(hobject_t, eversion_t)+0x4d4) [0x7fdf60c29794] 2: (ReplicatedPG::submit_push_complete(ObjectRecoveryInfo&, ObjectStore::Transaction*)+0x490) [0x7fdf60c2c950] 3: (ReplicatedPG::handle_pull_response(std::tr1::shared_ptr<OpRequest>)+0x4c6) [0x7fdf60c4ac26] 4: (ReplicatedPG::sub_op_push(std::tr1::shared_ptr<OpRequest>)+0x96) [0x7fdf60c4ba66] 5: (ReplicatedPG::do_sub_op(std::tr1::shared_ptr<OpRequest>)+0x3f7) [0x7fdf60c4bf17] 6: (PG::do_request(std::tr1::shared_ptr<OpRequest>)+0xa7) [0x7fdf60d03a07] 7: (OSD::dequeue_op(PG*)+0x23a) [0x7fdf60cc156a] 8: (ThreadPool::worker()+0x4c4) [0x7fdf60e86dd4] 9: (ThreadPool::WorkThread::entry()+0xd) [0x7fdf60cdab2d] 10: (()+0x7e9a) [0x7fdf604aee9a] 11: (clone()+0x6d) [0x7fdf5e9baccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Along with 10K more lines of spew about what it was doing. This is ceph 0.48.3-0ubuntu1~cloud0 from the Folsom pocket of the Ubuntu Cloud Archive and the machine is running Ubuntu 12.04 LTS. ** Affects: ceph (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1227327 Title: ceph osd repair fails with assert(missing.num_missing() == 0) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1227327/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs