heng (uker...@gmail.com)
Cc: Ceph Users (ceph-users@lists.ceph.com)
Subject: Re: [ceph-users] Ceph MDS laggy
On Mon, Mar 25, 2019 at 07:13:20PM +0800, Yan, Zheng wrote:
> Yes. the fix is in 12.2.11
Great, thanks.
--
Mark Schouten | Tuxis Internet Engineering
KvK: 61527076 | http://www.tuxi
On Mon, Mar 25, 2019 at 07:13:20PM +0800, Yan, Zheng wrote:
> Yes. the fix is in 12.2.11
Great, thanks.
--
Mark Schouten | Tuxis Internet Engineering
KvK: 61527076 | http://www.tuxis.nl/
T: 0318 200208 | i...@tuxis.nl
___
ceph-users mailing list
ceph
On Mon, Mar 25, 2019 at 6:36 PM Mark Schouten wrote:
>
> On Mon, Jan 21, 2019 at 10:17:31AM +0800, Yan, Zheng wrote:
> > It's http://tracker.ceph.com/issues/37977. Thanks for your help.
> >
>
> I think I've hit this bug. Ceph MDS using 100% ceph and reporting as
> laggy and being kicked out. I'm n
On Mon, Jan 21, 2019 at 10:17:31AM +0800, Yan, Zheng wrote:
> It's http://tracker.ceph.com/issues/37977. Thanks for your help.
>
I think I've hit this bug. Ceph MDS using 100% ceph and reporting as
laggy and being kicked out. I'm not sure though if this fix is currently
in a released version of L
It's http://tracker.ceph.com/issues/37977. Thanks for your help.
Regards
Yan, Zheng
On Sun, Jan 20, 2019 at 12:40 AM Adam Tygart wrote:
>
> It worked for about a week, and then seems to have locked up again.
>
> Here is the back trace from the threads on the mds:
> http://people.cs.ksu.edu/~moze
I've heard of the same(?) problem on another cluster; they upgraded
from 12.2.7 to 12.2.10 and suddenly got problems with their CephFS
(and only with the CephFS).
However, they downgraded the MDS to 12.2.8 before I could take a look
at it, so not sure what caused the issue. 12.2.8 works fine with t
The same user's jobs seem to be the instigator of this issue again.
I've looked through their code and see nothing too onerous.
This time it was 2400+ cores/jobs on 186 nodes all working in the same
directory. Each job reads in a different 110KB file, crunches numbers
for while (1+ hours) and then
Just re-checked my notes. We updated from 12.2.8 to 12.2.10 on the
27th of December.
--
Adam
On Sat, Jan 19, 2019 at 8:26 PM Adam Tygart wrote:
>
> Yes, we upgraded to 12.2.10 from 12.2.7 on the 27th of December. This didn't
> happen before then.
>
> --
> Adam
>
> On Sat, Jan 19, 2019, 20:17 Pa
Yes, we upgraded to 12.2.10 from 12.2.7 on the 27th of December. This didn't
happen before then.
--
Adam
On Sat, Jan 19, 2019, 20:17 Paul Emmerich
mailto:paul.emmer...@croit.io> wrote:
Did this only start to happen after upgrading to 12.2.10?
Paul
--
Paul Emmerich
Looking for help with your
Did this only start to happen after upgrading to 12.2.10?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Sat, Jan 19, 2019 at 5:40 PM Adam Tygart wrote:
>
> It wor
It worked for about a week, and then seems to have locked up again.
Here is the back trace from the threads on the mds:
http://people.cs.ksu.edu/~mozes/ceph-12.2.10-laggy-mds.gdb.txt
--
Adam
On Sun, Jan 13, 2019 at 7:41 PM Yan, Zheng wrote:
>
> On Sun, Jan 13, 2019 at 1:43 PM Adam Tygart wrote
On Sun, Jan 13, 2019 at 1:43 PM Adam Tygart wrote:
>
> Restarting the nodes causes the hanging again. This means that this is
> workload dependent and not a transient state.
>
> I believe I've tracked down what is happening. One user was running
> 1500-2000 jobs in a single directory with 92000+ f
Restarting the nodes causes the hanging again. This means that this is
workload dependent and not a transient state.
I believe I've tracked down what is happening. One user was running
1500-2000 jobs in a single directory with 92000+ files in it. I am
wondering if the cluster was getting ready to
On a hunch, I shutdown the compute nodes for our HPC cluster, and 10
minutes after that restarted the mds daemon. It replayed the journal,
evicted the dead compute nodes and is working again.
This leads me to believe there was a broken transaction of some kind
coming from the compute nodes (also a
Hello all,
I've got a 31 machine Ceph cluster running ceph 12.2.10 and CentOS 7.6.
We're using cephfs and rbd.
Last night, one of our two active/active mds servers went laggy and
upon restart once it goes active it immediately goes laggy again.
I've got a log available here (debug_mds 20, debug
--
> From: Yan, Zheng [mailto:uker...@gmail.com]
> Sent: Tuesday, April 29, 2014 10:13 PM
> To: Mohd Bazli Ab Karim
> Cc: Luke Jing Yuan; Wong Ming Tat
> Subject: Re: [ceph-users] Ceph mds laggy and failed assert in function replay
> mds/journal.cc
>
> On Tue, Apr 29, 2014 at
that the mds has passed the beacon to mon or not?
Thank you so much Zheng!
Bazli
-Original Message-
From: Yan, Zheng [mailto:uker...@gmail.com]
Sent: Tuesday, April 29, 2014 10:13 PM
To: Mohd Bazli Ab Karim
Cc: Luke Jing Yuan; Wong Ming Tat
Subject: Re: [ceph-users] Ceph mds laggy and
: Tuesday, 29 April, 2014 3:36 PM
To: Jingyuan Luke
Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph mds laggy and failed assert in function replay
mds/journal.cc
On Tue, Apr 29, 2014 at 3:13 PM, Jingyuan Luke wrote:
> Hi,
>
> Assuming we got MDS wor
On Tue, Apr 29, 2014 at 3:13 PM, Jingyuan Luke wrote:
> Hi,
>
> Assuming we got MDS working back on track, should we still leave the
> mds_wipe_sessions in the ceph.conf or remove it and restart MDS.
> Thanks.
No.
It has been several hours. the MDS still does not finish replaying the journal?
R
Hi,
Assuming we got MDS working back on track, should we still leave the
mds_wipe_sessions in the ceph.conf or remove it and restart MDS.
Thanks.
Regards,
Luke
On Tue, Apr 29, 2014 at 2:12 PM, Yan, Zheng wrote:
> On Tue, Apr 29, 2014 at 11:24 AM, Jingyuan Luke wrote:
>> Hi,
>>
>> We had appli
On Tue, Apr 29, 2014 at 11:24 AM, Jingyuan Luke wrote:
> Hi,
>
> We had applied the patch and recompile ceph as well as updated the
> ceph.conf as per suggested, when we re-run ceph-mds we noticed the
> following:
>
>
> 2014-04-29 10:45:22.260798 7f90b971d700 0 log [WRN] : replayed op
> client.3
Hi,
We had applied the patch and recompile ceph as well as updated the
ceph.conf as per suggested, when we re-run ceph-mds we noticed the
following:
2014-04-29 10:45:22.260798 7f90b971d700 0 log [WRN] : replayed op
client.324186:51366457,12681393 no session for client.324186
2014-04-29 10:45:2
On Sat, Apr 26, 2014 at 9:56 AM, Jingyuan Luke wrote:
> Hi Greg,
>
> Actually our cluster is pretty empty, but we suspect we had a temporary
> network disconnection to one of our OSD, not sure if this caused the
> problem.
>
> Anyway we don't mind try the method you mentioned, how can we do that?
Hi Greg,
Actually our cluster is pretty empty, but we suspect we had a temporary
network disconnection to one of our OSD, not sure if this caused the
problem.
Anyway we don't mind try the method you mentioned, how can we do that?
Regards,
Luke
On Saturday, April 26, 2014, Gregory Farnum wrote:
...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph mds laggy and failed assert in function replay
mds/journal.cc
Hmm, it looks like your on-disk SessionMap is horrendously out of date. Did
your cluster get full at some point?
In any case, we're working on tools to repair this no
Hmm, it looks like your on-disk SessionMap is horrendously out of
date. Did your cluster get full at some point?
In any case, we're working on tools to repair this now but they aren't
ready for use yet. Probably the only thing you could do is create an
empty sessionmap with a higher version than t
Dear Ceph-devel, ceph-users,
I am currently facing issue with my ceph mds server. Ceph-mds daemon does
not want to bring up back.
I tried running that manually with ceph-mds –i mon01 –d but it got aborted
and the log shows that it stucks at failed assert(session) line 1303 in
mds/journal.cc.
Dear Ceph-devel, ceph-users,
I am currently facing issue with my ceph mds server. Ceph-mds daemon does not
want to bring up back.
Tried running that manually with ceph-mds -i mon01 -d but it shows that it
stucks at failed assert(session) line 1303 in mds/journal.cc and aborted.
Can someone shed
28 matches
Mail list logo