Re: [ceph-users] OSD crashed during the fio test

2019-10-01 Thread Brad Hubbard
If it is only this one osd I'd be inclined to be taking a hard look at the underlying hardware and how it behaves/performs compared to the hw backing identical osds. The less likely possibility is that you have some sort of "hot spot" causing resource contention for that osd. To investigate that fu

Re: [ceph-users] OSD crashed during the fio test

2019-10-01 Thread Sasha Litvak
I updated firmware and kernel, running torture tests. So far no assert, but I still noticed this on the same osd as yesterday Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8cd05d7700' had timed out aft

Re: [ceph-users] OSD crashed during the fio test

2019-10-01 Thread Sasha Litvak
It was hardware indeed. Dell server reported a disk being reset with power on. Checking the usual suspects i.e. controller firmware, controller event log (if I can get one), drive firmware. I will report more when I get a better idea Thank you! On Tue, Oct 1, 2019 at 2:33 AM Brad Hubbard wrote

Re: [ceph-users] OSD crashed during the fio test

2019-10-01 Thread Brad Hubbard
Removed ceph-de...@vger.kernel.org and added d...@ceph.io On Tue, Oct 1, 2019 at 4:26 PM Alex Litvak wrote: > > Hellow everyone, > > Can you shed the line on the cause of the crash? Could actually client > request trigger it? > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:5

[ceph-users] OSD crashed during the fio test

2019-09-30 Thread Alex Litvak
Hellow everyone, Can you shed the line on the cause of the crash? Could actually client request trigger it? Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867 7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block) aio_submit retries 16 Sep 30 22:52:58 sto