Re: [ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-24 Thread Daniel K
Yes -- the crashed server also mounted cephfs as a client, and also likely had active writes to the file when it crashed. I have the max file size set to 17,592,186,044,416 -- but this file was about 5.8TB. The likely reason for the crash? The file was mounted as a fileio backstore to LIO, which

Re: [ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-24 Thread Gregory Farnum
On Wed, May 24, 2017 at 3:15 AM, John Spray wrote: > On Tue, May 23, 2017 at 11:41 PM, Daniel K wrote: >> Have a 20 OSD cluster -"my first ceph cluster" that has another 400 OSDs >> enroute. >> >> I was "beating up" on the cluster, and had been writing to a 6TB file in >> CephFS for several hours

Re: [ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-24 Thread Daniel K
Networking is 10Gig. I notice recovery IO is wildly variable, I assume that's normal. Very little load as this is yet to go into production, I was "seeing what it would handle" at the time it broke. I checked this morning and the slow request had gone and I could access the blocked file again. A

Re: [ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-24 Thread John Spray
On Tue, May 23, 2017 at 11:41 PM, Daniel K wrote: > Have a 20 OSD cluster -"my first ceph cluster" that has another 400 OSDs > enroute. > > I was "beating up" on the cluster, and had been writing to a 6TB file in > CephFS for several hours, during which I changed the crushmap to better > match my

[ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-23 Thread Daniel K
Have a 20 OSD cluster -"my first ceph cluster" that has another 400 OSDs enroute. I was "beating up" on the cluster, and had been writing to a 6TB file in CephFS for several hours, during which I changed the crushmap to better match my environment, generating a bunch of recovery IO. After about 5.