Great to hear! Thanks for the update. On Thu, Jan 14, 2021 at 5:18 PM Charles-François Natali <cf.nat...@gmail.com> wrote:
> It's a bit old but in case it could help, we recently implemented this > at work - here's how we did it: > - the NUMA topology is exposed via agent custom resources > - the framework does the allocation of the corresponding resources to > the tasks according to the NUMA topology: e.g. if the task requests 2 > CPUs within the same NUMA node, the framework would allocate them > - a custom executor then implements the CPU affinity/cpuset using the > resources provided by the framework > > It works really nicely. > > Cheers, > > Charles > > > Le mar. 7 juil. 2020 à 18:12, Milind Chabbi <mil...@uber.com> a écrit : > > > > Grégoire, thanks for your reply. This is super helpful to make a > stronger case around the affinity benefits. > > Would you be able to offer additional details that you mentioned? I am > definitely interested. > > Is your isolator source code publicly available? > > > > -Milind > > > > On Tue, Jul 7, 2020 at 3:14 AM Grégoire Seux <g.s...@criteo.com> wrote: > >> > >> Hello, > >> > >> I'd like to give you a return of experience because we've worked on > this last year. > >> We've used CFS bandwidth isolation for several years and encountered > many issues (lack of predictability, bugs present in old linux kernels and > lack of cache/memory locality). At some point, we've implemented a custom > isolator to manage cpusets (using > https://github.com/criteo/mesos-command-modules/ as a base to write an > isolator in a scripting language). > >> > >> The isolator had a very simple behavior: upon new task, look at which > cpus are not within a cpuset cgroup, select (if possible) cpus from the > same numa node and create cpuset cgroup for the starting task. > >> In practice, it provided a general decrease of cpu consumption (up to > 8% of some cpu intensive applications) and better ability to reason about > the cpu isolation model. > >> The allocation is optimistic: it tries to use cpus from the same numa > node but if it's not possible, task is spread accross nodes. In practice it > happens very rarely because of one small optimization to assign cpus from > the most loaded numa node (decreasing fragmentation of available cpus > accross numa nodes). > >> > >> I'd be glad to give more details if you are interested > >> > >> -- > >> Grégoire >