This is the second version of this series. V1 is available here: https://lore.kernel.org/r/20210204204903.350275...@linutronix.de
The recent effort to make the ASM entry code slim and unified moved the irq stack switching out of the low level ASM code so that the whole return from interrupt work and state handling can be done in C and the ASM code just handles the true low level details of entry and exit (which is horrible enough already due to the well thought out architeture). The main goal at this point was to get instrumentation and RCU state under control in a validated way. Inlining the switch mechanism was attempted back then, but that caused more objtool and unwinder trouble than we had already on our plate, so we ended up with a simple, functional but suboptimal implementation. The main issues are: - The unnecessary indirect call which is expensive thanks to retpoline - The inability to stay on the irq stack for softirq processing on return from interrupt which requires another stack switch operation. - The fact that the stack switching code ended up being an easy to find exploit gadget. This series revisits the problem and reimplements the stack switch mechanics via evil inline assembly. Peter Zijlstra provided the required objtool and unwinder changes already. These are available here: https://lore.kernel.org/r/20210203120222.451068...@infradead.org and the latest iteration of them is available from git: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git objtool/core The full series based on Peter's git branch is also available from git: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/entry All function calls are now direct and fully inlined including the single instance in the softirq code which is invoked from local_bh_enable() in task context. The extra 100 lines in the diffstat are pretty much the extensive commentry for the whole magic to spare everyone including myself to scratch heads 2 weeks down the road. The text size impact is in the noise and looking at the actual entry functions there is depending on the compiler variant even a small size decrease. The patches have been tested with gcc8, gcc10 and clang-13 (fresh from git). The difference between the output of these compilers is minimal. gcc8 being slightly worse due to stupid register selection and random NOPs injected. Changes vs. V1: - Use ASM_CALL_CONSTRAINT unconditionally (Josh) - New approach to handle the inlining without the extra #ifdeffery (Lai) - Added stable/fixes tag to patch 1 (Boris) - Style and comment updates (Boris) - Clarified the cacheline effect in the changelog (Peter) - Picked up Reviewed-by from Kees where appropriate Thanks, tglx --- arch/Kconfig | 6 arch/parisc/Kconfig | 1 arch/parisc/include/asm/hardirq.h | 4 arch/parisc/kernel/irq.c | 1 arch/powerpc/Kconfig | 1 arch/powerpc/include/asm/irq.h | 2 arch/powerpc/kernel/irq.c | 1 arch/s390/Kconfig | 1 arch/s390/include/asm/hardirq.h | 1 arch/s390/kernel/irq.c | 1 arch/sh/Kconfig | 1 arch/sh/include/asm/irq.h | 1 arch/sh/kernel/irq.c | 1 arch/sparc/Kconfig | 1 arch/sparc/include/asm/irq_64.h | 1 arch/sparc/kernel/irq_64.c | 1 arch/x86/Kconfig | 2 arch/x86/entry/common.c | 19 -- arch/x86/entry/entry_64.S | 41 ----- arch/x86/include/asm/idtentry.h | 11 - arch/x86/include/asm/irq.h | 2 arch/x86/include/asm/irq_stack.h | 279 ++++++++++++++++++++++++----------- arch/x86/include/asm/processor.h | 9 - arch/x86/include/asm/softirq_stack.h | 11 + arch/x86/kernel/apic/apic.c | 31 ++- arch/x86/kernel/cpu/common.c | 4 arch/x86/kernel/dumpstack_64.c | 22 ++ arch/x86/kernel/irq.c | 2 arch/x86/kernel/irq_32.c | 1 arch/x86/kernel/irq_64.c | 12 - arch/x86/kernel/process_64.c | 2 include/asm-generic/Kbuild | 1 include/asm-generic/softirq_stack.h | 14 + include/linux/interrupt.h | 9 - kernel/softirq.c | 2 35 files changed, 303 insertions(+), 196 deletions(-)