Re: [ceph-users] Slow Request on OSD

2016-09-02 Thread Reed Dier
Just to circle back to this: Drives: Seagate ST8000NM0065 Controller: LSI 3108 RAID-on-Chip At the time, no BBU on RoC controller. Each OSD drive was configured as a single RAID0 VD. What I believe to be the snake that bit us was the Seagate drives’ on-board caching. Using storcli to manage

Re: [ceph-users] Slow Request on OSD

2016-09-01 Thread Dan Jakubiec
Thanks you for all the help Wido: > On Sep 1, 2016, at 14:03, Wido den Hollander wrote: > > You have to mark those OSDs as lost and also force create the incomplete PGs. > This might be the root of our problems. We didn't mark the parent OSD as "lost" before we removed it.

Re: [ceph-users] Slow Request on OSD

2016-09-01 Thread Dan Jakubiec
Thanks Wido. Reed and I have been working together to try to restore this cluster for about 3 weeks now. I have been accumulating a number of failure modes that I am hoping to share with the Ceph group soon, but have been holding off a bit until we see the full picture clearly so that we can

Re: [ceph-users] Slow Request on OSD

2016-09-01 Thread Cloud List
On Thu, Sep 1, 2016 at 3:50 PM, Nick Fisk wrote: > > > Op 31 augustus 2016 om 23:21 schreef Reed Dier >: > > > > > > > > > Multiple XFS corruptions, multiple leveldb issues. Looked to be result > of write cache settings which have been adjusted now. > >

Re: [ceph-users] Slow Request on OSD

2016-09-01 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido > den Hollander > Sent: 01 September 2016 08:19 > To: Reed Dier <reed.d...@focusvq.com> > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Slow Req

Re: [ceph-users] Slow Request on OSD

2016-09-01 Thread Wido den Hollander
> Op 31 augustus 2016 om 23:21 schreef Reed Dier : > > > Multiple XFS corruptions, multiple leveldb issues. Looked to be result of > write cache settings which have been adjusted now. > That is bad news, really bad. > You’ll see below that there are tons of PG’s in

Re: [ceph-users] Slow Request on OSD

2016-08-31 Thread Reed Dier
Multiple XFS corruptions, multiple leveldb issues. Looked to be result of write cache settings which have been adjusted now. You’ll see below that there are tons of PG’s in bad states, and it was slowly but surely bringing the number of bad PGs down, but it seems to have hit a brick wall with

Re: [ceph-users] Slow Request on OSD

2016-08-31 Thread Wido den Hollander
> Op 31 augustus 2016 om 22:56 schreef Reed Dier : > > > After a power failure left our jewel cluster crippled, I have hit a sticking > point in attempted recovery. > > Out of 8 osd’s, we likely lost 5-6, trying to salvage what we can. > That's probably to much. How

[ceph-users] Slow Request on OSD

2016-08-31 Thread Reed Dier
After a power failure left our jewel cluster crippled, I have hit a sticking point in attempted recovery. Out of 8 osd’s, we likely lost 5-6, trying to salvage what we can. In addition to rados pools, we were also using CephFS, and the cephfs.metadata and cephfs.data pools likely lost plenty

[ceph-users] slow request on OSD replacement

2014-04-11 Thread Erwin Lubbers
Hi, We're using a 24 server / 48 OSD (3 replicas) Ceph cluster (version 0.67.3) for RBD storage only and it is working great, but if a failed disk is replaced by a brand new one and the system starts to backfill it gives a lot of slow requests messages for 5 to 10 minutes. Then it does become