Just to circle back to this:
Drives: Seagate ST8000NM0065
Controller: LSI 3108 RAID-on-Chip
At the time, no BBU on RoC controller.
Each OSD drive was configured as a single RAID0 VD.
What I believe to be the snake that bit us was the Seagate drives’ on-board
caching.
Using storcli to manage
Thanks you for all the help Wido:
> On Sep 1, 2016, at 14:03, Wido den Hollander wrote:
>
> You have to mark those OSDs as lost and also force create the incomplete PGs.
>
This might be the root of our problems. We didn't mark the parent OSD as
"lost" before we removed it.
Thanks Wido. Reed and I have been working together to try to restore this
cluster for about 3 weeks now. I have been accumulating a number of failure
modes that I am hoping to share with the Ceph group soon, but have been holding
off a bit until we see the full picture clearly so that we can
On Thu, Sep 1, 2016 at 3:50 PM, Nick Fisk wrote:
> > > Op 31 augustus 2016 om 23:21 schreef Reed Dier >:
> > >
> > >
> > > Multiple XFS corruptions, multiple leveldb issues. Looked to be result
> of write cache settings which have been adjusted now.
>
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido
> den Hollander
> Sent: 01 September 2016 08:19
> To: Reed Dier <reed.d...@focusvq.com>
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Slow Req
> Op 31 augustus 2016 om 23:21 schreef Reed Dier :
>
>
> Multiple XFS corruptions, multiple leveldb issues. Looked to be result of
> write cache settings which have been adjusted now.
>
That is bad news, really bad.
> You’ll see below that there are tons of PG’s in
Multiple XFS corruptions, multiple leveldb issues. Looked to be result of write
cache settings which have been adjusted now.
You’ll see below that there are tons of PG’s in bad states, and it was slowly
but surely bringing the number of bad PGs down, but it seems to have hit a
brick wall with
> Op 31 augustus 2016 om 22:56 schreef Reed Dier :
>
>
> After a power failure left our jewel cluster crippled, I have hit a sticking
> point in attempted recovery.
>
> Out of 8 osd’s, we likely lost 5-6, trying to salvage what we can.
>
That's probably to much. How
After a power failure left our jewel cluster crippled, I have hit a sticking
point in attempted recovery.
Out of 8 osd’s, we likely lost 5-6, trying to salvage what we can.
In addition to rados pools, we were also using CephFS, and the cephfs.metadata
and cephfs.data pools likely lost plenty
Hi,
We're using a 24 server / 48 OSD (3 replicas) Ceph cluster (version 0.67.3) for
RBD storage only and it is working great, but if a failed disk is replaced by a
brand new one and the system starts to backfill it gives a lot of slow requests
messages for 5 to 10 minutes. Then it does become
10 matches
Mail list logo