Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-06-21 Thread Yan, Zheng
== > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Yan, Zheng > Sent: 20 May 2019 13:34 > To: Frank Schilder > Cc: Stefan Kooman; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] mimic: MDS st

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-06-21 Thread Frank Schilder
13:34 To: Frank Schilder Cc: Stefan Kooman; ceph-users@lists.ceph.com Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) On Sat, May 18, 2019 at 5:47 PM Frank Schilder wrote: > > Dear Yan and Stefan, > > it happened again and there were only

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-20 Thread Frank Schilder
To: Frank Schilder Cc: Stefan Kooman; ceph-users@lists.ceph.com Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) On Sat, May 18, 2019 at 5:47 PM Frank Schilder wrote: > > Dear Yan and Stefan, > > it happened again and there were only very few ops in

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-20 Thread Yan, Zheng
On Sat, May 18, 2019 at 5:47 PM Frank Schilder wrote: > > Dear Yan and Stefan, > > it happened again and there were only very few ops in the queue. I pulled the > ops list and the cache. Please find a zip file here: > "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a bit

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-18 Thread Frank Schilder
To: Frank Schilder Cc: Yan, Zheng; ceph-users@lists.ceph.com Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) Quoting Frank Schilder (fr...@dtu.dk): > > [root@ceph-01 ~]# ceph status # before the MDS failed over > cluster: > id: ### > heal

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-18 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk): > > [root@ceph-01 ~]# ceph status # before the MDS failed over > cluster: > id: ### > health: HEALTH_WARN > 1 MDSs report slow requests > > services: > mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03 > mgr: ceph-01(active),

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-18 Thread Frank Schilder
Hi Stefan, cc Yan, thanks for your quick reply. > I am pretty sure you hit bug #26982: https://tracker.ceph.com/issues/26982 > "mds: crash when dumping ops in flight". Everything is fine, the daemon did not crash. The dump cache operation seems to be a blocking operation. It simply blocked

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-18 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk): > Dear Yan and Stefan, > > it happened again and there were only very few ops in the queue. I > pulled the ops list and the cache. Please find a zip file here: > "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . > Its a bit more than 100MB.

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-18 Thread Frank Schilder
Dear Yan and Stefan, it happened again and there were only very few ops in the queue. I pulled the ops list and the cache. Please find a zip file here: "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l; . Its a bit more than 100MB. The active MDS failed over to the standby

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Yan, Zheng
hope it doesn't take too long. > > Thanks for your input! > > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Yan, Zheng > Sent: 16 May 2019 09:35 > To: Frank Schilder > Subject: Re: [c

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Frank Schilder
From: Yan, Zheng Sent: 16 May 2019 09:35 To: Frank Schilder Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) On Thu, May 16, 2019 at 2:52 PM Frank Schilder wrote: > > Dear Yan, > > OK, I will try to trigger the problem again and dump the

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Frank Schilder
childer > Cc: Stefan Kooman; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS > bug?) > > > [...] > > This time I captured the MDS ops list (log output does not really contain > > more info than this list). It con

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk): > Dear Stefan, > > thanks for the fast reply. We encountered the problem again, this time in a > much simpler situation; please see below. However, let me start with your > questions first: > > What bug? -- In a single-active MDS set-up, should there ever

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Frank Schilder
-users] mimic: MDS standby-replay causing blocked ops (MDS bug?) > [...] > This time I captured the MDS ops list (log output does not really contain > more info than this list). It contains 12 ops and I will include it here in > full length (hope this is acceptable): > Your iss

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-15 Thread Yan, Zheng
> "duration": 23.463467, > "type_data": { > "flag_point": "failed to authpin, dir is being fragmented", > "reqid": "client.377552:5446", > "op_ty

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-15 Thread Frank Schilder
"time": "2019-05-15 11:38:36.511392", "event": "all_read" }, { "time": "2019-05-15 11:38:36.511561", "eve

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-14 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk): If at all possible I would: Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2). Use more recent kernels on the clients. Below settings for [mds] might help with trimming (you might already have changed mds_log_max_segments to 128

[ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-13 Thread Frank Schilder
Short story: We have a new HPC installation with file systems provided by cephfs (home, apps, ...). We have one cephfs and all client file systems are sub-directory mounts. On this ceph file system, we have a bit more than 500 nodes with currently 2 ceph fs mounts each, resulting