Re: [ceph-users] Manually mucked up pg, need help fixing

2014-05-05 Thread Jeff Bachtel
noout was set while I manhandled osd.4 in and out of the cluster repeatedly (trying to set copy data from other osds and set attr to make osd.4 pick up that it had objects in pg 0.2f). It wasn't set before the problem, and isn't set currently. I don't really know where you saw pool size = 1:

Re: [ceph-users] Manually mucked up pg, need help fixing

2014-05-05 Thread Jeff Bachtel
Thanks. That is a cool utility, unfortunately I'm pretty sure the pg in question had a cephfs object instead of rbd images (because mounting cephfs is the only noticeable brokenness). Jeff On 05/05/2014 06:43 PM, Jake Young wrote: I was in a similar situation where I could see the PGs data on

Re: [ceph-users] Manually mucked up pg, need help fixing

2014-05-05 Thread Gregory Farnum
Oh, you've got no-out set. Did you lose an OSD at any point? Are you really running the system with pool size 1? I think you've managed to erase the up-to-date data, but not the records of that data's existence. You'll have to explore the various "lost" commands, but I'm not sure what the right app

Re: [ceph-users] Manually mucked up pg, need help fixing

2014-05-05 Thread Jake Young
I was in a similar situation where I could see the PGs data on an osd, but there was nothing I could do to force the pg to use that osd's copy. I ended up using the rbd_restore tool to create my rbd on disk and then I reimported it into the pool. See this thread for info on rbd_restore: http://ww

Re: [ceph-users] Manually mucked up pg, need help fixing

2014-05-05 Thread Jeff Bachtel
Well, that'd be the ideal solution. Please check out the github gist I posted, though. It seems that despite osd.4 having nothing good for pg 0.2f, the cluster does not acknowledge any other osd has a copy of the pg. I've tried downing osd.4 and manually deleting the pg directory in question wi

Re: [ceph-users] Manually mucked up pg, need help fixing

2014-05-05 Thread Gregory Farnum
What's your cluster look like? I wonder if you can just remove the bad PG from osd.4 and let it recover from the existing osd.1 -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sat, May 3, 2014 at 9:17 AM, Jeff Bachtel wrote: > This is all on firefly rc1 on CentOS 6 > > I ha

[ceph-users] Manually mucked up pg, need help fixing

2014-05-03 Thread Jeff Bachtel
This is all on firefly rc1 on CentOS 6 I had an osd getting overfull, and misinterpreting directions I downed it then manually removed pg directories from the osd mount. On restart and after a good deal of rebalancing (setting osd weights as I should've originally), I'm now at cluster de