Hi,

My cluster is back to HEALTH_OK, the involved host has been restarted by the user. But I will debug some more on the host when i see this issue again next time.

PS: For completeness, i've stated that this issue was often seen in my current Jewel environment, I meant to say that this issue comes up sometimes (so not so often). But the times when i *do* have this issue, it blocks some I/O for clients as a consequence.

That's why I assume that the root cause might be a bug in ceph-fuse. There's support for page cache in ceph-fuse (not sure whether it is active by default), and afaik it has to keep the capabilities around as long as the corresponding file is still in the cache. If another clients wants to access the file, the mds might need to revoke the capabilites for cached files (e.g. if one client wants to overwrite a file that has been read by another client before). The client has to wait until it is able to acquire the capabilities, resulting in blocked I/O.

We had similar problems in the past with ceph-fuse, especially if page cache support was active. We have switched to kernel based cephfs in the meantime (with it's own pro and cons).

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to