On Fri, 25 Jan 2019, 08:53 Vijay Bellur <vbel...@redhat.com wrote: > Thank you for the detailed update, Xavi! This looks very interesting. > > On Thu, Jan 24, 2019 at 7:50 AM Xavi Hernandez <xhernan...@redhat.com> > wrote: > >> Hi all, >> >> I've just updated a patch [1] that implements a new thread pool based on >> a wait-free queue provided by userspace-rcu library. The patch also >> includes an auto scaling mechanism that only keeps running the needed >> amount of threads for the current workload. >> >> This new approach has some advantages: >> >> - It's provided globally inside libglusterfs instead of inside an >> xlator >> >> This makes it possible that fuse thread and epoll threads transfer the >> received request to another thread sooner, wating less CPU and reacting >> sooner to other incoming requests. >> >> >> - Adding jobs to the queue used by the thread pool only requires an >> atomic operation >> >> This makes the producer side of the queue really fast, almost with no >> delay. >> >> >> - Contention is reduced >> >> The producer side has negligible contention thanks to the wait-free >> enqueue operation based on an atomic access. The consumer side requires a >> mutex, but the duration is very small and the scaling mechanism makes sure >> that there are no more threads than needed contending for the mutex. >> >> >> This change disables io-threads, since it replaces part of its >> functionality. However there are two things that could be needed from >> io-threads: >> >> - Prioritization of fops >> >> Currently, io-threads assigns priorities to each fop, so that some fops >> are handled before than others. >> >> >> - Fair distribution of execution slots between clients >> >> Currently, io-threads processes requests from each client in round-robin. >> >> >> These features are not implemented right now. If they are needed, >> probably the best thing to do would be to keep them inside io-threads, but >> change its implementation so that it uses the global threads from the >> thread pool instead of its own threads. >> > > > These features are indeed useful to have and hence modifying the > implementation of io-threads to provide this behavior would be welcome. > > > >> >> >> These tests have shown that the limiting factor has been the disk in most >> cases, so it's hard to tell if the change has really improved things. There >> is only one clear exception: self-heal on a dispersed volume completes >> 12.7% faster. The utilization of CPU has also dropped drastically: >> >> Old implementation: 12.30 user, 41.78 sys, 43.16 idle, 0.73 wait >> >> New implementation: 4.91 user, 5.52 sys, 81.60 idle, 5.91 wait >> >> >> Now I'm running some more tests on NVMe to try to see the effects of the >> change when disk is not limiting performance. I'll update once I've more >> data. >> >> > Will look forward to these numbers. >
I have identified an issue that limits the number of active threads when load is high, causing some regressions. I'll fix it and rerun the tests on Monday. Xavi > > Regards, > Vijay >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel