Re: [PATCH tip-pti 2/2] x86/entry: interleave XOR register clearing with PUSH/MOV instructions

2018-02-07 Thread Linus Torvalds
On Wed, Feb 7, 2018 at 9:05 AM, Linus Torvalds wrote: > > The other thing we need to do is to just pass down the system call > number as an argument instead of reloading it. .. we may also want to disable some debug things. For example, if you enable KASAN, it does insane things for do_syscall()

Re: [PATCH tip-pti 2/2] x86/entry: interleave XOR register clearing with PUSH/MOV instructions

2018-02-07 Thread Linus Torvalds
On Wed, Feb 7, 2018 at 7:18 AM, Andi Kleen wrote: > > Fast path saves more than just register saving. I changed the fast path > to save all registers in my earlier clearregs branches I know. I saw your patches. And I went "Eww". > It is still quite a bit faster than all the slow stuff the C do_

Re: [PATCH tip-pti 2/2] x86/entry: interleave XOR register clearing with PUSH/MOV instructions

2018-02-07 Thread Andi Kleen
> Plus the fastpath couldn't clear those registers anyway, since it > didn't even _save_ them - exactly because the whole point of the > fastpath was that not all registers are clobbered by the calling > conventions. Fast path saves more than just register saving. I changed the fast path to save

Re: [PATCH tip-pti 2/2] x86/entry: interleave XOR register clearing with PUSH/MOV instructions

2018-02-06 Thread Linus Torvalds
On Tue, Feb 6, 2018 at 3:54 PM, Andi Kleen wrote: > > But for push, on older CPUs (older AMD, most Atoms, really old Intel big core) > sub+mov is a lot faster than push because push has additional dependencies > causing pipeline bubbles. So you would make these cases slower if you > use PUSH. I r

Re: [PATCH tip-pti 2/2] x86/entry: interleave XOR register clearing with PUSH/MOV instructions

2018-02-06 Thread Andi Kleen
> The reason for that complexity is purely the system call fastpath case > that no longer exists, I think. > > Am I missing something? Yes merging the macros should be fine without fast path. But for push, on older CPUs (older AMD, most Atoms, really old Intel big core) sub+mov is a lot faster

Re: [PATCH tip-pti 2/2] x86/entry: interleave XOR register clearing with PUSH/MOV instructions

2018-02-06 Thread Andy Lutomirski
On Tue, Feb 6, 2018 at 10:48 PM, Linus Torvalds wrote: > On Tue, Feb 6, 2018 at 1:32 PM, Dominik Brodowski > wrote: >> Same as is done for syscalls, interleave XOR with PUSH or MOV >> instructions for exceptions/interrupts, in order to minimize >> the cost of the additional instructions required

Re: [PATCH tip-pti 2/2] x86/entry: interleave XOR register clearing with PUSH/MOV instructions

2018-02-06 Thread Linus Torvalds
On Tue, Feb 6, 2018 at 1:32 PM, Dominik Brodowski wrote: > Same as is done for syscalls, interleave XOR with PUSH or MOV > instructions for exceptions/interrupts, in order to minimize > the cost of the additional instructions required for register > clearing. Side note: I would _really_ like to s

Re: [PATCH tip-pti 2/2] x86/entry: interleave XOR register clearing with PUSH/MOV instructions

2018-02-06 Thread Dan Williams
On Tue, Feb 6, 2018 at 1:32 PM, Dominik Brodowski wrote: > Same as is done for syscalls, interleave XOR with PUSH or MOV > instructions for exceptions/interrupts, in order to minimize > the cost of the additional instructions required for register > clearing. > > Signed-off-by: Dominik Brodowski

[PATCH tip-pti 2/2] x86/entry: interleave XOR register clearing with PUSH/MOV instructions

2018-02-06 Thread Dominik Brodowski
Same as is done for syscalls, interleave XOR with PUSH or MOV instructions for exceptions/interrupts, in order to minimize the cost of the additional instructions required for register clearing. Signed-off-by: Dominik Brodowski diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h ind