Hi Paul,

On Wed, Dec 13, 2023 at 9:50 PM Paul Mezzanini <pfm...@rit.edu> wrote:
>
> Long story short, we've got a lot of empty directories that I'm working on 
> removing.  While removing directories, using "perf top -g" we can watch the 
> MDS daemon go to 100% cpu usage with "SnapRealm:: split_at" and 
> "CInode::is_ancestor_of".
>
> It's this 2 year old bug that still is around.
> https://tracker.ceph.com/issues/53192

Unfortunately the fix isn't straightforward as it was attempted, so
lately, we've been working around these issues by pinning
to-be-deleted directories to a (separate) active MDS. This might need
some tuning at the application level to move stuff inside this
"special" pinned directory and then delete it.

HTH.

>
> To help combat this, we've moved our snapshot schedule down the tree one 
> level so the snaprealm is significantly smaller.  Our luck with multiple 
> active MDSs hasn't been great so we are still on a single MDS.  To help split 
> the load, I'm working on moving different workloads to different filesytems 
> within ceph.
>
> A user can still fairly easily overwhelm the MDS's finisher thread and 
> basically stop all cephfs io through that MDS.     I'm hoping we can get some 
> other people chiming in with "Me Too!" so there can be some traction behind 
> fixing this.
>
> It's a longstanding bug so the version is less important, but we are on 
> 17.2.7.
>
> Thoughts?
> -paul
>
> --
>
> Paul Mezzanini
> Platform Engineer III
> Research Computing
>
> Rochester Institute of Technology
>
>  “End users is a description, not a goal.”
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to