Re: Help...MDS Continuously Segfaulting

2012-11-03 Thread Gregory Farnum
Sage merged it into master, so whatever you like. If you remove the patch and the error happens again, your MDS will fail on replay as it did here. If you leave it in, it has no effect other than handling that particular bad case. -Greg On Tue, Oct 30, 2012 at 3:22 AM, Nick Couchman

Re: Help...MDS Continuously Segfaulting

2012-11-03 Thread Gregory Farnum
It should apply cleanly on top of 0.48.2. There may be a 0.48.3, but it won't be driven by this patch. -Greg On Sat, Nov 3, 2012 at 7:27 PM, Nick Couchman nick.couch...@seakr.com wrote: Okay - I'm planning to try to go to version 0.48.2, the latest stable - is the patch available for that

Re: Help...MDS Continuously Segfaulting

2012-10-29 Thread Nick Couchman
Okay, that patch worked and it seems to be running, again. Should I continue to run with that patch, or go back to the original binaries? Gregory Farnum 10/19/12 4:16 PM I've written a small patch on top of v0.48.1argonaut which should avoid this. It's in branch 3369-mds-session-workaround

Re: Help...MDS Continuously Segfaulting

2012-10-19 Thread Nick Couchman
One of the MDSs crashed over the weekend (late Friday night), but I believe that one was not active and was just in Replay mode. Other than that, I don't know of anything that would have affected the MDSs. -Nick On 2012/10/18 at 16:55, Gregory Farnum g...@inktank.com wrote: Okay, looked at

Re: Help...MDS Continuously Segfaulting

2012-10-19 Thread Gregory Farnum
I've written a small patch on top of v0.48.1argonaut which should avoid this. It's in branch 3369-mds-session-workaround and will simply log an error in the monitor central log instead of segfaulting. There should shortly be packages available at

Re: Help...MDS Continuously Segfaulting

2012-10-18 Thread Nick Couchman
Hopefully this is what you're looking for... (gdb) bt #0 ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at mds/journal.cc:828 #1 0x006a2446 in MDLog::_replay_thread (this=0x1281390) at mds/MDLog.cc:580 #2 0x004cf5ed in MDLog::ReplayThread::entry (this=optimized out) at

Re: Help...MDS Continuously Segfaulting

2012-10-18 Thread Gregory Farnum
Yep, thanks! I'll have to go through and see if I can figure out what's going on there. On Thu, Oct 18, 2012 at 8:56 AM, Nick Couchman nick.couch...@seakr.com wrote: Hopefully this is what you're looking for... (gdb) bt #0 ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at

Re: Help...MDS Continuously Segfaulting

2012-10-18 Thread Gregory Farnum
Okay, looked at this a little bit. Can you describe what was happening before you got into this failed-replay loop? (So, why was it in replay at all?) I see that the monitor marked it as laggy for some reason; was the cluster under load; did the monitors break; something else? I can see why it's

Re: Help...MDS Continuously Segfaulting

2012-10-17 Thread Nick Couchman
Thanks...here's the backtrace: (gdb) bt #0 0x004dcfea in ESession::replay(MDS*) () #1 0x006a2446 in MDLog::_replay_thread() () #2 0x004cf5ed in MDLog::ReplayThread::entry() () #3 0x7764df05 in start_thread () from /lib64/libpthread.so.0 #4 0x7680d10d in

Re: Help...MDS Continuously Segfaulting

2012-10-17 Thread Sam Lang
On 10/17/2012 09:42 AM, Nick Couchman wrote: Thanks...here's the backtrace: (gdb) bt #0 0x004dcfea in ESession::replay(MDS*) () #1 0x006a2446 in MDLog::_replay_thread() () #2 0x004cf5ed in MDLog::ReplayThread::entry() () #3 0x7764df05 in start_thread () from

Re: Help...MDS Continuously Segfaulting

2012-10-17 Thread Nick Couchman
Hmmm...I don't seem to have the dbg packages built...will have to go back and figure out how to build those. -Nick On 2012/10/17 at 09:53, Sam Lang sam.l...@inktank.com wrote: On 10/17/2012 09:42 AM, Nick Couchman wrote: Thanks...here's the backtrace: (gdb) bt #0 0x004dcfea in

Re: Help...MDS Continuously Segfaulting

2012-10-17 Thread Sam Lang
On 10/17/2012 11:23 AM, Nick Couchman wrote: Hmmm...I don't seem to have the dbg packages built...will have to go back and figure out how to build those. Ah I thought you had installed from debian binaries. If you compiled ceph yourself, to get the debugging symbols you have to reconfigure

Re: Help...MDS Continuously Segfaulting

2012-10-16 Thread Gregory Farnum
Okay, that's the right debugging but it wasn't quite as helpful on its own as I expected. Can you get a core dump (you might already have one, depending on system settings) of the crash and open it up with gdb and get a full backtrace? -Greg On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman

Help...MDS Continuously Segfaulting

2012-10-15 Thread Nick Couchman
Well, both of my MDSs seem to be down right now, and then continually segfault (every time I try to start them) with the following: ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f starting mds.b at :/0 *** Caught signal (Segmentation fault) ** in thread 7fbe0d61d700 ceph version

Re: Help...MDS Continuously Segfaulting

2012-10-15 Thread Gregory Farnum
Something in the MDS log is bad or is poking at a bug in the code. Can you turn on MDS debugging and restart a daemon and put that log somewhere accessible? debug mds = 20 debug journaler = 20 debug ms = 1 -Greg On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman nick.couch...@seakr.com wrote: Well,

Re: Help...MDS Continuously Segfaulting

2012-10-15 Thread Nick Couchman
Anywhere in particular I should make it available? It's a little over a million lines of debug in the file - I can put it on a pastebin, if that works, or perhaps zip it up and throw it somewhere? -Nick On 2012/10/15 at 11:26, Gregory Farnum g...@inktank.com wrote: Something in the MDS log

Re: Help...MDS Continuously Segfaulting

2012-10-15 Thread Gregory Farnum
Yeah, zip it and post — somebody's going to have to download it and do fun things. :) -Greg On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman nick.couch...@seakr.com wrote: Anywhere in particular I should make it available? It's a little over a million lines of debug in the file - I can put it