[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Jonas Jelten
Hi! Since you are on nautilus and I was on mimic back then, the messages may have changed. The script is only an automatization for deleting many broken PGs, you can perform the procedure by hand first. You can perform the steps in my state machine by hand and identify the right messages, and

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Robert Sander
Hi Jonas, On 21.04.20 14:47, Jonas Jelten wrote: > I hope my script still works for you. If you need any help, I'll see what I > can do :) The script currently does not find the info it needs and wants us to increase to logging level. We set the logging level to 10 and tried to restart the

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Jonas Jelten
Hi! Yes, it looks like you hit the same bug. My corruption back then happed because the server was out-of-memory and OSDs restarted and crashed quickly again and again for quite some time... What I think happens is that the journals somehow get out of sync between OSDs, which is something

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Paul Emmerich
On Tue, Apr 21, 2020 at 12:44 PM Brad Hubbard wrote: > > On Tue, Apr 21, 2020 at 6:35 PM Paul Emmerich wrote: > > > > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > > > > > Wait for recovery to finish so you know whether any data from the down > > > OSDs is required. If not just

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Brad Hubbard
On Tue, Apr 21, 2020 at 6:35 PM Paul Emmerich wrote: > > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > > > Wait for recovery to finish so you know whether any data from the down > > OSDs is required. If not just reprovision them. > > Recovery will not finish from this state as several

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Robert Sander
Hi, On 21.04.20 10:33, Paul Emmerich wrote: > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: >> >> Wait for recovery to finish so you know whether any data from the down >> OSDs is required. If not just reprovision them. > > Recovery will not finish from this state as several PGs are down

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Marc Roos
: ceph-users Subject: [ceph-users] Re: Nautilus cluster damaged + crashing OSDs On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > Wait for recovery to finish so you know whether any data from the down > OSDs is required. If not just reprovision them. Recovery will not fi

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-21 Thread Paul Emmerich
On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote: > > Wait for recovery to finish so you know whether any data from the down > OSDs is required. If not just reprovision them. Recovery will not finish from this state as several PGs are down and/or stale. Paul > > If data is required from the

[ceph-users] Re: Nautilus cluster damaged + crashing OSDs

2020-04-20 Thread Brad Hubbard
Wait for recovery to finish so you know whether any data from the down OSDs is required. If not just reprovision them. If data is required from the down OSDs you will need to run a query on the pg(s) to find out what OSDs have the required copies of the pg/object required. you can then export the