Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-12 Thread Oliver Daudey
Hey Yan, On 12-09-13 03:58, Yan, Zheng wrote: > On Thu, Sep 12, 2013 at 3:26 AM, Oliver Daudey > wrote: >> Hey Yan, >> >> Just confirming that creating fresh pools and doing the newfs on those >> fixed the problem, while restarting the OSDs didn't, thanks again! If >> yo

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-12 Thread Oliver Daudey
Hey Gregory, On wo, 2013-09-11 at 11:36 -0700, Gregory Farnum wrote: > On Wed, Sep 11, 2013 at 7:48 AM, Yan, Zheng wrote: > > On Wed, Sep 11, 2013 at 10:06 PM, Oliver Daudey wrote: > >> Hey Yan, > >> > >> On 11-09-13 15:12, Yan, Zheng wrote: > >>> On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-11 Thread Yan, Zheng
On Thu, Sep 12, 2013 at 3:26 AM, Oliver Daudey wrote: > Hey Yan, > > Just confirming that creating fresh pools and doing the newfs on those > fixed the problem, while restarting the OSDs didn't, thanks again! If > you come up with a permanent fix, let me know and I'll test it for you. > > Here i

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-11 Thread Oliver Daudey
Hey Yan, Just confirming that creating fresh pools and doing the newfs on those fixed the problem, while restarting the OSDs didn't, thanks again! If you come up with a permanent fix, let me know and I'll test it for you. Regards, Oliver On wo, 2013-09-11 at 22:48 +0800, Yan, Zheng w

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-11 Thread Gregory Farnum
On Wed, Sep 11, 2013 at 7:48 AM, Yan, Zheng wrote: > On Wed, Sep 11, 2013 at 10:06 PM, Oliver Daudey wrote: >> Hey Yan, >> >> On 11-09-13 15:12, Yan, Zheng wrote: >>> On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey wrote: Hey Gregory, I wiped and re-created the MDS-cluster I just m

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-11 Thread Yan, Zheng
On Wed, Sep 11, 2013 at 10:06 PM, Oliver Daudey wrote: > Hey Yan, > > On 11-09-13 15:12, Yan, Zheng wrote: >> On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey wrote: >>> Hey Gregory, >>> >>> I wiped and re-created the MDS-cluster I just mailed about, starting out >>> by making sure CephFS is not mo

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-11 Thread Oliver Daudey
Hey Yan, On 11-09-13 15:12, Yan, Zheng wrote: > On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey wrote: >> Hey Gregory, >> >> I wiped and re-created the MDS-cluster I just mailed about, starting out >> by making sure CephFS is not mounted anywhere, stopping all MDSs, >> completely cleaning the "dat

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-11 Thread Yan, Zheng
On Wed, Sep 11, 2013 at 9:12 PM, Yan, Zheng wrote: > On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey wrote: >> Hey Gregory, >> >> I wiped and re-created the MDS-cluster I just mailed about, starting out >> by making sure CephFS is not mounted anywhere, stopping all MDSs, >> completely cleaning the

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-11 Thread Yan, Zheng
On Wed, Sep 11, 2013 at 7:51 PM, Oliver Daudey wrote: > Hey Gregory, > > I wiped and re-created the MDS-cluster I just mailed about, starting out > by making sure CephFS is not mounted anywhere, stopping all MDSs, > completely cleaning the "data" and "metadata"-pools using "rados > --pool= cleanup

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-11 Thread Oliver Daudey
Hey Gregory, I wiped and re-created the MDS-cluster I just mailed about, starting out by making sure CephFS is not mounted anywhere, stopping all MDSs, completely cleaning the "data" and "metadata"-pools using "rados --pool= cleanup ", then creating a new cluster using `ceph mds newfs 1 0 --yes-i-

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-11 Thread Oliver Daudey
Hey Gregory, FYI: I just attempted to upgrade a second cluster where CephFS was in use and got this: -24> 2013-09-11 12:00:36.674469 7f0de1438700 1 -- 194.109.43.76:6800/8335 --> 194.109.43.73:6789/0 -- mon_subscribe({mdsmap=525898+,monmap=26+,osdmap=529384}) v2 -- ?+0 0x35b0700 con 0x35a9580

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory, On di, 2013-09-10 at 14:48 -0700, Gregory Farnum wrote: > On Tue, Sep 10, 2013 at 2:36 PM, Oliver Daudey wrote: > > Hey Gregory, > > > > My cluster consists of 3 nodes, each running 1 mon, 1 osd and 1 mds. I > > upgraded from 0.67, but was still running 0.61.7 OSDs at the time of th

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory, Ok, thanks for all your help! It's weird, as if the object gets deleted somewhere along the way, but the problem only becomes visible once you restart the MDSs, which probably have it in memory and then fail to load it after restart. I'll answer the questions you had about my test-s

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory, Thanks for your explanation. Turns out to be 1.a7 and it seems to scrub OK. # ceph osd getmap -o osdmap # osdmaptool --test-map-object mds_anchortable --pool 1 osdmap osdmaptool: osdmap file 'osdmap' object 'mds_anchortable' -> 1.a7 -> [2,0] # ceph pg scrub 1.a7 osd.2 logs: 2013-0

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
Nope, a repair won't change anything if scrub doesn't detect any inconsistencies. There must be something else going on, but I can't fathom what...I'll try and look through it a bit more tomorrow. :/ -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Sep 10, 2013 at 3:49 P

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory, On 10-09-13 20:21, Gregory Farnum wrote: > On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey wrote: >> Hey list, >> >> I just upgraded to Ceph 0.67.3. What I did on every node of my 3-node >> cluster was: >> - Unmount CephFS everywhere. >> - Upgrade the Ceph-packages. >> - Restart MON

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory, My cluster consists of 3 nodes, each running 1 mon, 1 osd and 1 mds. I upgraded from 0.67, but was still running 0.61.7 OSDs at the time of the upgrade, because of performance-issues that have just recently been fixed. These have now been upgraded to 0.67.3, along with the rest of C

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
On Tue, Sep 10, 2013 at 2:36 PM, Oliver Daudey wrote: > Hey Gregory, > > My cluster consists of 3 nodes, each running 1 mon, 1 osd and 1 mds. I > upgraded from 0.67, but was still running 0.61.7 OSDs at the time of the > upgrade, because of performance-issues that have just recently been > fixed.

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey Gregory, The only objects containing "table" I can find at all, are in the "metadata"-pool: # rados --pool=metadata ls | grep -i table mds0_inotable Looking at another cluster where I use CephFS, there is indeed an object named "mds_anchortable", but the broken cluster is missing it. I don't

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
If the problem is somewhere in RADOS/xfs/whatever, then there's a good chance that the "mds_anchortable" object exists in its replica OSDs, but when listing objects those aren't queried, so they won't show up in a listing. You can use the osdmaptool to map from an object name to the PG it would sho

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Liu, Larry
This is scary. Should I hold on upgrade? On 9/10/13 11:33 AM, "Oliver Daudey" wrote: >Hey Gregory, > >On 10-09-13 20:21, Gregory Farnum wrote: >> On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey >>wrote: >>> Hey list, >>> >>> I just upgraded to Ceph 0.67.3. What I did on every node of my 3-node

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
Also, can you scrub the PG which contains the "mds_anchortable" object and see if anything comes up? You should be able to find the key from the logs (in the osd_op line that contains "mds_anchortable") and convert that into the PG. Or you can just scrub all of osd 2. -Greg Software Engineer #42 @

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
It's not an upgrade issue. There's an MDS object that is somehow missing. If it exists, then on restart you'll be fine. Oliver, what is your general cluster config? What filesystem are your OSDs running on? What version of Ceph were you upgrading from? There's really no way for this file to not ex

[ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Oliver Daudey
Hey list, I just upgraded to Ceph 0.67.3. What I did on every node of my 3-node cluster was: - Unmount CephFS everywhere. - Upgrade the Ceph-packages. - Restart MON. - Restart OSD. - Restart MDS. As soon as I got to the second node, the MDS crashed right after startup. Part of the logs (more on

Re: [ceph-users] CephFS no longer mounts and asserts in MDS after upgrade to 0.67.3

2013-09-10 Thread Gregory Farnum
On Tue, Sep 10, 2013 at 10:54 AM, Oliver Daudey wrote: > Hey list, > > I just upgraded to Ceph 0.67.3. What I did on every node of my 3-node > cluster was: > - Unmount CephFS everywhere. > - Upgrade the Ceph-packages. > - Restart MON. > - Restart OSD. > - Restart MDS. > > As soon as I got to the