It should apply cleanly on top of 0.48.2. There may be a 0.48.3, but
it won't be driven by this patch.
-Greg
On Sat, Nov 3, 2012 at 7:27 PM, Nick Couchman wrote:
> Okay - I'm planning to try to go to version 0.48.2, the latest stable - is
> the patch available for that branch, or will there be a
Okay - I'm planning to try to go to version 0.48.2, the latest stable - is the
patch available for that branch, or will there be a 0.48.3 release coming?
>>> Gregory Farnum 11/03/12 11:45 AM >>>
Sage merged it into master, so whatever you like. If you remove the
patch and the error happens again
Sage merged it into master, so whatever you like. If you remove the
patch and the error happens again, your MDS will fail on replay as it
did here. If you leave it in, it has no effect other than handling
that particular bad case.
-Greg
On Tue, Oct 30, 2012 at 3:22 AM, Nick Couchman wrote:
> Okay
Okay, that patch worked and it seems to be running, again. Should I continue
to run with that patch, or go back to the original binaries?
>>> Gregory Farnum 10/19/12 4:16 PM >>>
I've written a small patch on top of v0.48.1argonaut which should
avoid this. It's in branch 3369-mds-session-workaro
I've written a small patch on top of v0.48.1argonaut which should
avoid this. It's in branch 3369-mds-session-workaround and will simply
log an error in the monitor central log instead of segfaulting. There
should shortly be packages available at
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-b
One of the MDSs crashed over the weekend (late Friday night), but I believe
that one was not active and was just in Replay mode. Other than that, I don't
know of anything that would have affected the MDSs.
-Nick
>>> On 2012/10/18 at 16:55, Gregory Farnum wrote:
> Okay, looked at this a littl
Okay, looked at this a little bit. Can you describe what was happening
before you got into this failed-replay loop? (So, why was it in replay
at all?) I see that the monitor marked it as laggy for some reason;
was the cluster under load; did the monitors break; something else?
I can see why it's fa
Yep, thanks! I'll have to go through and see if I can figure out
what's going on there.
On Thu, Oct 18, 2012 at 8:56 AM, Nick Couchman wrote:
> Hopefully this is what you're looking for...
> (gdb) bt
> #0 ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at
> mds/journal.cc:828
> #1 0x
Hopefully this is what you're looking for...
(gdb) bt
#0 ESession::replay (this=0x7fffcc49a7c0, mds=0x127d5f0) at mds/journal.cc:828
#1 0x006a2446 in MDLog::_replay_thread (this=0x1281390) at
mds/MDLog.cc:580
#2 0x004cf5ed in MDLog::ReplayThread::entry (this=) at
mds/MDLog.h:86
On 10/17/2012 11:23 AM, Nick Couchman wrote:
Hmmm...I don't seem to have the dbg packages built...will have to go back and
figure out how to build those.
Ah I thought you had installed from debian binaries. If you compiled
ceph yourself, to get the debugging symbols you have to reconfigure
Hmmm...I don't seem to have the dbg packages built...will have to go back and
figure out how to build those.
-Nick
>>> On 2012/10/17 at 09:53, Sam Lang wrote:
> On 10/17/2012 09:42 AM, Nick Couchman wrote:
>> Thanks...here's the backtrace:
>> (gdb) bt
>> #0 0x004dcfea in ESession::rep
On 10/17/2012 09:42 AM, Nick Couchman wrote:
Thanks...here's the backtrace:
(gdb) bt
#0 0x004dcfea in ESession::replay(MDS*) ()
#1 0x006a2446 in MDLog::_replay_thread() ()
#2 0x004cf5ed in MDLog::ReplayThread::entry() ()
#3 0x7764df05 in start_thread () from /l
Thanks...here's the backtrace:
(gdb) bt
#0 0x004dcfea in ESession::replay(MDS*) ()
#1 0x006a2446 in MDLog::_replay_thread() ()
#2 0x004cf5ed in MDLog::ReplayThread::entry() ()
#3 0x7764df05 in start_thread () from /lib64/libpthread.so.0
#4 0x7680d10d in
On 10/16/2012 06:04 PM, Gregory Farnum wrote:
Okay, that's the right debugging but it wasn't quite as helpful on its
own as I expected. Can you get a core dump (you might already have
one, depending on system settings) of the crash and open it up with
gdb and get a full backtrace?
You can also
Okay, that's the right debugging but it wasn't quite as helpful on its
own as I expected. Can you get a core dump (you might already have
one, depending on system settings) of the crash and open it up with
gdb and get a full backtrace?
-Greg
On Mon, Oct 15, 2012 at 10:59 AM, Nick Couchman wrote:
Yeah, zip it and post — somebody's going to have to download it and do
fun things. :)
-Greg
On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman wrote:
> Anywhere in particular I should make it available? It's a little over a
> million lines of debug in the file - I can put it on a pastebin, if that
Anywhere in particular I should make it available? It's a little over a
million lines of debug in the file - I can put it on a pastebin, if that works,
or perhaps zip it up and throw it somewhere?
-Nick
>>> On 2012/10/15 at 11:26, Gregory Farnum wrote:
> Something in the MDS log is bad or is
Something in the MDS log is bad or is poking at a bug in the code. Can
you turn on MDS debugging and restart a daemon and put that log
somewhere accessible?
debug mds = 20
debug journaler = 20
debug ms = 1
-Greg
On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman wrote:
> Well, both of my MDSs seem t
Well, both of my MDSs seem to be down right now, and then continually segfault
(every time I try to start them) with the following:
ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
starting mds.b at :/0
*** Caught signal (Segmentation fault) **
in thread 7fbe0d61d700
ceph version 0
19 matches
Mail list logo