On Tue, 28 Jun 2022 09:25:50 +0200 Carsten Andrich <[email protected]> wrote:
> On 24.06.22 17:01, Stephen Hemminger wrote: > > On Thu, 23 Jun 2022 21:03:49 +0200 > > Carsten Andrich <[email protected]> wrote: > > > >> 2. Use real-time priority (SCHED_FIFO w/ priority 99) for the DPDK > >> threads and > >> echo -1 > /proc/sys/kernel/sched_rt_runtime_us > >> to disable the runtime limit. With the runtime limit in place, the > >> SCHED_FIFO performance will be significantly worse than SCHED_OTHER. > > This can cause major issues if application is normal DPDK application > > (never does system calls). > > If an interrupt or other event happens on your isolated CPU, the work that > > it would > > do in soft irq is never performed. FIFO has higher priority than kernel > > threads. > > This can lead to mystery lockups from other applications (reads not > > completing, network timeouts, etc). > > Thanks for pointing that out. Do you know of any official kernel > documentation that could shed some light on that? I haven't had any > serious issues like the ones you list, but maybe I've been lucky. My > DPDK applications typically run on fairly minimal systems used > exclusively for DPDK tasks, which require minimal latency/jitter. Minor > side-effects from using SCHED_FIFO are tolerable in my case, if it > improves performance. Do some looking around and you will find good documentation like: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/tuning_guide/real_time_throttling This characteristic of real-time threads means that it is quite easy to write an application which monopolizes 100% of a given CPU. At first glance this sounds like it might be a good idea, but in reality it causes lots of headaches for the operating system. The OS is responsible for managing both system-wide and per-CPU resources and must periodically examine data structures describing these resources and perform housekeeping activities with them. If a core is monopolized by a SCHED_FIFO thread, it cannot perform the housekeeping tasks and eventually the entire system becomes unstable, potentially causing a crash.
