Re: [ceph-users] cephfs kernel client instability

2019-01-28 Thread Martin Palma
Upgrading to 4.15.0-43-generic fixed the problem. Best, Martin On Fri, Jan 25, 2019 at 9:43 PM Ilya Dryomov wrote: > > On Fri, Jan 25, 2019 at 9:40 AM Martin Palma wrote: > > > > > Do you see them repeating every 30 seconds? > > > > yes: > > > > Jan 25 09:34:37 sdccgw01 kernel: [6306813.737615]

Re: [ceph-users] cephfs kernel client instability

2019-01-25 Thread Ilya Dryomov
On Fri, Jan 25, 2019 at 9:40 AM Martin Palma wrote: > > > Do you see them repeating every 30 seconds? > > yes: > > Jan 25 09:34:37 sdccgw01 kernel: [6306813.737615] libceph: mon4 > 10.8.55.203:6789 session lost, hunting for new mon > Jan 25 09:34:37 sdccgw01 kernel: [6306813.737620] libceph: mon3

Re: [ceph-users] cephfs kernel client instability

2019-01-25 Thread Martin Palma
> Do you see them repeating every 30 seconds? yes: Jan 25 09:34:37 sdccgw01 kernel: [6306813.737615] libceph: mon4 10.8.55.203:6789 session lost, hunting for new mon Jan 25 09:34:37 sdccgw01 kernel: [6306813.737620] libceph: mon3 10.8.55.202:6789 session lost, hunting for new mon Jan 25 09:34:37

Re: [ceph-users] cephfs kernel client instability

2019-01-25 Thread Ilya Dryomov
On Fri, Jan 25, 2019 at 8:37 AM Martin Palma wrote: > > Hi Ilya, > > thank you for the clarification. After setting the > "osd_map_messages_max" to 10 the io errors and the MDS error > "MDS_CLIENT_LATE_RELEASE" are gone. > > The messages of "mon session lost, hunting for new new mon" didn't go >

Re: [ceph-users] cephfs kernel client instability

2019-01-24 Thread Martin Palma
Hi Ilya, thank you for the clarification. After setting the "osd_map_messages_max" to 10 the io errors and the MDS error "MDS_CLIENT_LATE_RELEASE" are gone. The messages of "mon session lost, hunting for new new mon" didn't go away... can it be that this is related to https://tracker.ceph.com/is

Re: [ceph-users] cephfs kernel client instability

2019-01-24 Thread Ilya Dryomov
On Thu, Jan 24, 2019 at 6:21 PM Andras Pataki wrote: > > Hi Ilya, > > Thanks for the clarification - very helpful. > I've lowered osd_map_messages_max to 10, and this resolves the issue > about the kernel being unhappy about large messages when the OSDMap > changes. One comment here though: you m

Re: [ceph-users] cephfs kernel client instability

2019-01-24 Thread Ilya Dryomov
On Thu, Jan 24, 2019 at 8:16 PM Martin Palma wrote: > > We are experiencing the same issues on clients with CephFS mounted > using the kernel client and 4.x kernels. > > The problem shows up when we add new OSDs, on reboots after > installing patches and when changing the weight. > > Here the log

Re: [ceph-users] cephfs kernel client instability

2019-01-24 Thread Andras Pataki
Hi Ilya, Thanks for the clarification - very helpful. I've lowered osd_map_messages_max to 10, and this resolves the issue about the kernel being unhappy about large messages when the OSDMap changes.  One comment here though: you mentioned that Luminous uses 40 as the default, which is indeed

Re: [ceph-users] cephfs kernel client instability

2019-01-24 Thread Martin Palma
We are experiencing the same issues on clients with CephFS mounted using the kernel client and 4.x kernels. The problem shows up when we add new OSDs, on reboots after installing patches and when changing the weight. Here the logs of a misbehaving client; [6242967.890611] libceph: mon4 10.8.55.

Re: [ceph-users] cephfs kernel client instability

2019-01-16 Thread Ilya Dryomov
On Wed, Jan 16, 2019 at 7:12 PM Andras Pataki wrote: > > Hi Ilya/Kjetil, > > I've done some debugging and tcpdump-ing to see what the interaction > between the kernel client and the mon looks like. Indeed - > CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon > messages for our clus

Re: [ceph-users] cephfs kernel client instability

2019-01-16 Thread Andras Pataki
Hi Ilya/Kjetil, I've done some debugging and tcpdump-ing to see what the interaction between the kernel client and the mon looks like.  Indeed - CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon messages for our cluster (with osd_mon_messages_max at 100).  We have about 3500 os

Re: [ceph-users] cephfs kernel client instability

2019-01-16 Thread Ilya Dryomov
On Wed, Jan 16, 2019 at 1:27 AM Kjetil Joergensen wrote: > > Hi, > > you could try reducing "osd map message max", some code paths that end up as > -EIO (kernel: libceph: mon1 *** io error) is exceeding > include/linux/ceph/libceph.h:CEPH_MSG_MAX_{FRONT,MIDDLE,DATA}_LEN. > > This "worked for us"

Re: [ceph-users] cephfs kernel client instability

2019-01-15 Thread Kjetil Joergensen
Hi, you could try reducing "osd map message max", some code paths that end up as -EIO (kernel: libceph: mon1 *** io error) is exceeding include/linux/ceph/libceph.h:CEPH_MSG_MAX_{FRONT,MIDDLE,DATA}_LEN. This "worked for us" - YMMV. -KJ On Tue, Jan 15, 2019 at 6:14 AM Andras Pataki wrote: > An

Re: [ceph-users] cephfs kernel client instability

2019-01-15 Thread Andras Pataki
An update on our cephfs kernel client troubles.  After doing some heavier testing with a newer kernel 4.19.13, it seems like it also gets into a bad state when it can't connect to monitors (all back end processes are on 12.2.8): Jan 15 08:49:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 ses

Re: [ceph-users] cephfs kernel client instability

2019-01-03 Thread Andras Pataki
I wonder if anyone could offer any insight on the issue below, regarding the CentOS 7.6 kernel cephfs client connecting to a Luminous cluster.  I have since tried a much newer 4.19.13 kernel, which did not show the same issue (but unfortunately for various reasons unrelated to ceph, we can't go