Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-30 Thread Brad Hubbard
Excuse the top-posting. When looking at the logs it helps to filter by the actual thread that crashed. $ grep 7f08af3b6700 ceph-osd.27.log.last.error.txt|tail -15 -1001> 2019-10-30 12:55:41.498823 7f08af3b6700 1 -- 129.20.199.93:6803/977508 --> 129.20.199.7:0/2975967502 --

Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-30 Thread Jérémy Gardais
The "best" health i was able to get was : HEALTH_ERR norecover flag(s) set; 1733/37482459 objects misplaced (0.005%); 5 scrub errors; Possible data damage: 2 pgs inconsistent; Degraded data redundancy: 7461/37482459 objects degraded (0.020%), 24 pgs degraded, 2 pgs undersized OSDMAP_FLAGS

Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-30 Thread Jérémy Gardais
Thus spake Brad Hubbard (bhubb...@redhat.com) on mercredi 30 octobre 2019 à 12:50:50: > Maybe you should set nodown and noout while you do these maneuvers? > That will minimise peering and recovery (data movement). As the commands don't take too long, i just had a few slow requests before the

Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-29 Thread Brad Hubbard
On Tue, Oct 29, 2019 at 9:09 PM Jérémy Gardais wrote: > > Thus spake Brad Hubbard (bhubb...@redhat.com) on mardi 29 octobre 2019 à > 08:20:31: > > Yes, try and get the pgs healthy, then you can just re-provision the down > > OSDs. > > > > Run a scrub on each of these pgs and then use the

Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-29 Thread Jérémy Gardais
Thus spake Brad Hubbard (bhubb...@redhat.com) on mardi 29 octobre 2019 à 08:20:31: > Yes, try and get the pgs healthy, then you can just re-provision the down > OSDs. > > Run a scrub on each of these pgs and then use the commands on the > following page to find out more information for each

Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-28 Thread Brad Hubbard
Yes, try and get the pgs healthy, then you can just re-provision the down OSDs. Run a scrub on each of these pgs and then use the commands on the following page to find out more information for each case. https://docs.ceph.com/docs/luminous/rados/troubleshooting/troubleshooting-pg/ Focus on the

[ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-28 Thread Jérémy Gardais
Hello, From several weeks, i have some OSDs flapping before ending out of the cluster by Ceph… I was hoping some Ceph's magic and just gave it sometime to auto heal (and be able to do all the side work…) but it was a bad idea (what a surprise :D). Also got some inconsistents PGs, but i was