Re: [ceph-users] OSD assert hit suicide timeout

2017-09-19 Thread Stanley Zhang
med out after 150 2017-09-19 03:06:39.782749 7fdfeae86700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*, time_t)' thread 7fdfeae86700 time 2017-09-19  03:06:39.778940 common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")

Re: [ceph-users] ceph-osd restartd via systemd in case of disk error

2017-09-19 Thread Stanley Zhang
ng due to an IO error. A other idea: The OSD daemon keeps running in a defined error state and only stops the listeners with other OSDs and the clients. -- *Stanley Zhang | * Senior Operations Engineer *Telephone:* +64 9 302 0515 *Fax:* +64 9 302 0518 *Mobile:* +64 22 318 3664 *Freephone:* 08

Re: [ceph-users] [rgw][s3] Object not in objects list

2017-08-31 Thread Stanley Zhang
Your bucket index got corrupted. I believe there is no easy way to restore the index other than downloading existing objects and re-upload them, correct me if anybody else know a better way. You can check out all your objects in that bucket with: rados -p .rgw.buckets ls | grep default.3278576

[ceph-users] deep-scrub taking long time(possible leveldb corruption?)

2017-08-01 Thread Stanley Zhang
ot-usable before but usable 2 days later? One thing that might fix the index object is leveldb compactions I guess. By the way the above problematic index object has ~30k keys, the biggest index object in our cluster holds about 300k keys. Regards Stanley -- *Stanley Zhang | * Senior Operations