[ceph-users] Re: multiple OSD crash, unfound objects

2020-12-15 Thread Frank Schilder
;> The other option is what you describe, create a new data pool, make the fs >> root placed on this pool and copy every file onto itself. This should also >> do the trick. However, with this method you will not be able to get rid of >> the broken pool. After the copy, you could, however,

[ceph-users] Re: multiple OSD crash, unfound objects

2020-12-15 Thread Michael Thomas
==== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 22 November 2020 18:29:16 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects On 10/23/20 3:07 AM, Frank Schilder wrote

[ceph-users] Re: multiple OSD crash, unfound objects

2020-11-22 Thread Frank Schilder
Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects Hi Frank, From my understanding, with my current filesystem layout, I should be able to remove the "broken" pool once the data has been moved off of it. This is because the "broken&q

[ceph-users] Re: multiple OSD crash, unfound objects

2020-11-22 Thread Frank Schilder
these will work. Please post your experience here. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 22 November 2020 18:29:16 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re

[ceph-users] Re: multiple OSD crash, unfound objects

2020-11-22 Thread Michael Thomas
_____ From: Michael Thomas Sent: 22 November 2020 18:29:16 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects On 10/23/20 3:07 AM, Frank Schilder wrote: Hi Michael. I still don't see any traffic to the pool, though I'm al

[ceph-users] Re: multiple OSD crash, unfound objects

2020-11-22 Thread Michael Thomas
On 10/23/20 3:07 AM, Frank Schilder wrote: Hi Michael. I still don't see any traffic to the pool, though I'm also unsure how much traffic is to be expected. Probably not much. If ceph df shows that the pool contains some objects, I guess that's sorted. That osdmaptool crashes indicates tha

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-23 Thread Frank Schilder
Hi Michael. > I still don't see any traffic to the pool, though I'm also unsure how much > traffic is to be expected. Probably not much. If ceph df shows that the pool contains some objects, I guess that's sorted. That osdmaptool crashes indicates that your cluster runs with corrupted interna

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-22 Thread Michael Thomas
rds, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 22 October 2020 09:32:07 To: Michael Thomas; ceph-users@ceph.io Subject: [ceph-users] Re: multiple OSD crash, unfound objects Sounds good. Did you re-create the pool again?

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-22 Thread Michael Thomas
ng client session (now defunct) has been blacklisted. I'll check back later to see if the slow OPS get cleared from 'ceph status'. Regards, --Mike From: Michael Thomas Sent: 20 October 2020 23:48:36 To: Frank Schilder; ceph-users@ceph.io Su

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-22 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 22 October 2020 09:32:07 To: Michael Thomas; ceph-users@ceph.io Subject: [ceph-users] Re: multiple OSD crash, unfound objects Sounds good. Did you re-create the pool

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-22 Thread Frank Schilder
later to see if the slow OPS get cleared from 'ceph status'. Regards, --Mike ________________ > From: Michael Thomas > Sent: 20 October 2020 23:48:36 > To: Frank Schilder; ceph-users@ceph.io > Subject: Re: [ceph-users] Re: multiple OSD crash, unfound

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-21 Thread Michael Thomas
w defunct) has been blacklisted. I'll check back later to see if the slow OPS get cleared from 'ceph status'. Regards, --Mike From: Michael Thomas Sent: 20 October 2020 23:48:36 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re:

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-21 Thread Frank Schilder
olves the issue (but tell the user :). Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 20 October 2020 23:48:36 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multipl

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-20 Thread Frank Schilder
Dear Michael, > > Can you create a test pool with pg_num=pgp_num=1 and see if the PG gets an > > OSD mapping? I meant here with crush rule replicated_host_nvme. Sorry, forgot. > Yes, the OSD was still out when the previous health report was created. Hmm, this is odd. If this is correct, then

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-20 Thread Michael Thomas
On 10/20/20 1:18 PM, Frank Schilder wrote: Dear Michael, Can you create a test pool with pg_num=pgp_num=1 and see if the PG gets an OSD mapping? I meant here with crush rule replicated_host_nvme. Sorry, forgot. Seems to have worked fine: https://pastebin.com/PFgDE4J1 Yes, the OSD was st

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-19 Thread Michael Thomas
lly see why the missing OSDs are not assigned to the two PGs 1.0 and 7.39d. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____________________ From: Frank Schilder Sent: 16 October 2020 15:41:29 To: Michael Thomas; ceph-users@ceph.io Subject: [ceph-users] Re: multiple O

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-19 Thread Michael Thomas
trative, like peering attempts. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 16 October 2020 15:09:20 To: Michael Thomas; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD cras

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-16 Thread Frank Schilder
see if this has any effect. The crush rules and crush tree look OK to me. I can't really see why the missing OSDs are not assigned to the two PGs 1.0 and 7.39d. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____________________ From:

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-16 Thread Frank Schilder
Frank Schilder Sent: 16 October 2020 15:09:20 To: Michael Thomas; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects Dear Michael, thanks for this initial work. I will need to look through the files you posted in more detail. In the meantime: Please mark OSD 41 as

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-16 Thread Frank Schilder
0 02:08:01 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects On 10/14/20 3:49 PM, Frank Schilder wrote: > Hi Michael, > > it doesn't look too bad. All degraded objects are due to the undersized PG. > If this is an EC pool with m&

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-15 Thread Chad William Seys
This problem may also be related to the below unsolved issue, which specifically mentions 'unfound' objects. Sadly, there is probably nothing in the report which will help with your troubleshooting. https://tracker.ceph.com/issues/44286 C. ___ ceph-

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-15 Thread Michael Thomas
t the incomplete PG resolved with the above, but it will move some issues out of the way before proceeding. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____________ From: Michael Thomas Sent: 14 October 2020 20:52:10 To: Andreas

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-14 Thread Frank Schilder
l Thomas Sent: 14 October 2020 20:52:10 To: Andreas John; ceph-users@ceph.io Subject: [ceph-users] Re: multiple OSD crash, unfound objects Hello, The original cause of the OSD instability has already been fixed. It was due to user jobs (via condor) consuming too much memory and causing the machine

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-14 Thread Michael Thomas
backup. This will >>> "park" the problem of cluster health for later fixing. >>> >>> Best regads, >>> = >>> Frank Schilder >>> AIT Risø Campus >>> Bygning 109, rum S14 >>> >>> __

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-14 Thread Michael Thomas
_______ > From: Michael Thomas > Sent: 09 October 2020 22:33:46 > To: Frank Schilder; ceph-users@ceph.io > Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects > > Hi Frank, > > That was a good tip. I was able to move the broken file

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-10 Thread Andreas John
n and restore the now missing data from backup. This will >> "park" the problem of cluster health for later fixing. >> >> Best regads, >> = >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >>

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-10 Thread Frank Schilder
t regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 09 October 2020 22:33:46 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects Hi Frank, That was a good ti

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-09 Thread Michael Thomas
er health for later fixing. Best regads, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 18 September 2020 15:38:51 To: Michael Thomas; ceph-users@ceph.io Subject: [ceph-users] Re: multiple OSD crash, unfound objects Dear Micha

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Frank Schilder
" the problem of cluster health for later fixing. Best regads, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 18 September 2020 15:38:51 To: Michael Thomas; ceph-users@ceph.io Subject: [ceph-users] Re: mu

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Frank Schilder
Dear Michael, > I disagree with the statement that trying to recover health by deleting > data is a contradiction. In some cases (such as mine), the data in ceph > is backed up in another location (eg tape library). Restoring a few > files from tape is a simple and cheap operation that takes a m

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Michael Thomas
Hi Frank, On 9/18/20 2:50 AM, Frank Schilder wrote: Dear Michael, firstly, I'm a bit confused why you started deleting data. The objects were unfound, but still there. That's a small issue. Now the data might be gone and that's a real issue. Interval: Anyone rea

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-18 Thread Frank Schilder
Dear Michael, firstly, I'm a bit confused why you started deleting data. The objects were unfound, but still there. That's a small issue. Now the data might be gone and that's a real issue. Interval: Anyone reading this: I have seen many threads where ceph admins s

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-17 Thread Michael Thomas
Hi Frank, Yes, it does sounds similar to your ticket. I've tried a few things to restore the failed files: * Locate a missing object with 'ceph pg $pgid list_unfound' * Convert the hex oid to a decimal inode number * Identify the affected file with 'find /ceph -inum $inode' At this point, I

[ceph-users] Re: multiple OSD crash, unfound objects

2020-09-16 Thread Frank Schilder
Sounds similar to this one: https://tracker.ceph.com/issues/46847 If you have or can reconstruct the crush map from before adding the OSDs, you might be able to discover everything with the temporary reversal of the crush map method. Not sure if there is another method, i never got a reply to m