Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Mon, Nov 13, 2017 at 1:07 PM, Dave Hansenwrote: > On 11/12/2017 07:52 PM, Andy Lutomirski wrote: >> On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen >> wrote: >>> On 11/10/2017 02:06 PM, Andy Lutomirski wrote: I have nothing against disabling native. I object to breaking the weird binary tracing behavior in the emulation mode, especially if it's tangled up with KAISER. I got all kinds of flak in an earlier version of the vsyscall emulation patches when I broke that use case. KAISER may get very widely backported -- let's not make changes that are already known to break things. >>> >>> Is the thing that broke a "user mode program that actually looks at the >>> vsyscall page"? Like Linus is referring to here: >>> >> Yes. But I disagree with Linus. I think it would be perfectly >> reasonable to enable KAISER and to use a tool like pin on a legacy >> binary from some enterprise distribution. I bet there are lots of >> enterprise distributions that are still supported that use vsyscalls. > > All we need to do in the end here is to re-set _PAGE_USER on the user > page table PGD that is used by the vsyscall page. We should be able to > do that with a line or two of code in kaiser_init(). We can do it > conditionally on when the VDSO is not compile-time disabled. > > I can do this as a follow-on patch, or as the last one in the KAISER > series and leave it up to our esteemed maintainers to decide whether > they want to do it or not. Sound good? > > Are there any userspace tests around that I can use for this, or will I > have to cook something up? I don't. This old test might be adaptable: https://git.kernel.org/pub/scm/linux/kernel/git/luto/misc-tests.git/tree/test_vsyscall.cc What you'd want to do is to add a variant that allocates some RWX memory, memcpys the vsyscall page there, and tests that it still works (but only if the vsyscall page worked in the first place).
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Mon, Nov 13, 2017 at 1:07 PM, Dave Hansen wrote: > On 11/12/2017 07:52 PM, Andy Lutomirski wrote: >> On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen >> wrote: >>> On 11/10/2017 02:06 PM, Andy Lutomirski wrote: I have nothing against disabling native. I object to breaking the weird binary tracing behavior in the emulation mode, especially if it's tangled up with KAISER. I got all kinds of flak in an earlier version of the vsyscall emulation patches when I broke that use case. KAISER may get very widely backported -- let's not make changes that are already known to break things. >>> >>> Is the thing that broke a "user mode program that actually looks at the >>> vsyscall page"? Like Linus is referring to here: >>> >> Yes. But I disagree with Linus. I think it would be perfectly >> reasonable to enable KAISER and to use a tool like pin on a legacy >> binary from some enterprise distribution. I bet there are lots of >> enterprise distributions that are still supported that use vsyscalls. > > All we need to do in the end here is to re-set _PAGE_USER on the user > page table PGD that is used by the vsyscall page. We should be able to > do that with a line or two of code in kaiser_init(). We can do it > conditionally on when the VDSO is not compile-time disabled. > > I can do this as a follow-on patch, or as the last one in the KAISER > series and leave it up to our esteemed maintainers to decide whether > they want to do it or not. Sound good? > > Are there any userspace tests around that I can use for this, or will I > have to cook something up? I don't. This old test might be adaptable: https://git.kernel.org/pub/scm/linux/kernel/git/luto/misc-tests.git/tree/test_vsyscall.cc What you'd want to do is to add a variant that allocates some RWX memory, memcpys the vsyscall page there, and tests that it still works (but only if the vsyscall page worked in the first place).
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/12/2017 07:52 PM, Andy Lutomirski wrote: > On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen >wrote: >> On 11/10/2017 02:06 PM, Andy Lutomirski wrote: >>> I have nothing against disabling native. I object to breaking the >>> weird binary tracing behavior in the emulation mode, especially if >>> it's tangled up with KAISER. I got all kinds of flak in an earlier >>> version of the vsyscall emulation patches when I broke that use case. >>> KAISER may get very widely backported -- let's not make changes that >>> are already known to break things. >> >> Is the thing that broke a "user mode program that actually looks at the >> vsyscall page"? Like Linus is referring to here: >> > Yes. But I disagree with Linus. I think it would be perfectly > reasonable to enable KAISER and to use a tool like pin on a legacy > binary from some enterprise distribution. I bet there are lots of > enterprise distributions that are still supported that use vsyscalls. All we need to do in the end here is to re-set _PAGE_USER on the user page table PGD that is used by the vsyscall page. We should be able to do that with a line or two of code in kaiser_init(). We can do it conditionally on when the VDSO is not compile-time disabled. I can do this as a follow-on patch, or as the last one in the KAISER series and leave it up to our esteemed maintainers to decide whether they want to do it or not. Sound good? Are there any userspace tests around that I can use for this, or will I have to cook something up?
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/12/2017 07:52 PM, Andy Lutomirski wrote: > On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen > wrote: >> On 11/10/2017 02:06 PM, Andy Lutomirski wrote: >>> I have nothing against disabling native. I object to breaking the >>> weird binary tracing behavior in the emulation mode, especially if >>> it's tangled up with KAISER. I got all kinds of flak in an earlier >>> version of the vsyscall emulation patches when I broke that use case. >>> KAISER may get very widely backported -- let's not make changes that >>> are already known to break things. >> >> Is the thing that broke a "user mode program that actually looks at the >> vsyscall page"? Like Linus is referring to here: >> > Yes. But I disagree with Linus. I think it would be perfectly > reasonable to enable KAISER and to use a tool like pin on a legacy > binary from some enterprise distribution. I bet there are lots of > enterprise distributions that are still supported that use vsyscalls. All we need to do in the end here is to re-set _PAGE_USER on the user page table PGD that is used by the vsyscall page. We should be able to do that with a line or two of code in kaiser_init(). We can do it conditionally on when the VDSO is not compile-time disabled. I can do this as a follow-on patch, or as the last one in the KAISER series and leave it up to our esteemed maintainers to decide whether they want to do it or not. Sound good? Are there any userspace tests around that I can use for this, or will I have to cook something up?
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansenwrote: > On 11/10/2017 02:06 PM, Andy Lutomirski wrote: >> On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen >> wrote: >>> On 11/09/2017 06:25 PM, Andy Lutomirski wrote: Here are two proposals to address this without breaking vsyscalls. 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high mappings but, optionally, warn if you see _PAGE_USER on any address that isn't the vsyscall page. 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so KAISER doesn't muck with it. >>> >>> These are totally doable. But, what's the big deal with breaking native >>> vsyscall? We can still do the emulation so nothing breaks: it is just slow. >> >> I have nothing against disabling native. I object to breaking the >> weird binary tracing behavior in the emulation mode, especially if >> it's tangled up with KAISER. I got all kinds of flak in an earlier >> version of the vsyscall emulation patches when I broke that use case. >> KAISER may get very widely backported -- let's not make changes that >> are already known to break things. > > Is the thing that broke a "user mode program that actually looks at the > vsyscall page"? Like Linus is referring to here: > Yes. But I disagree with Linus. I think it would be perfectly reasonable to enable KAISER and to use a tool like pin on a legacy binary from some enterprise distribution. I bet there are lots of enterprise distributions that are still supported that use vsyscalls.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen wrote: > On 11/10/2017 02:06 PM, Andy Lutomirski wrote: >> On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen >> wrote: >>> On 11/09/2017 06:25 PM, Andy Lutomirski wrote: Here are two proposals to address this without breaking vsyscalls. 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high mappings but, optionally, warn if you see _PAGE_USER on any address that isn't the vsyscall page. 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so KAISER doesn't muck with it. >>> >>> These are totally doable. But, what's the big deal with breaking native >>> vsyscall? We can still do the emulation so nothing breaks: it is just slow. >> >> I have nothing against disabling native. I object to breaking the >> weird binary tracing behavior in the emulation mode, especially if >> it's tangled up with KAISER. I got all kinds of flak in an earlier >> version of the vsyscall emulation patches when I broke that use case. >> KAISER may get very widely backported -- let's not make changes that >> are already known to break things. > > Is the thing that broke a "user mode program that actually looks at the > vsyscall page"? Like Linus is referring to here: > Yes. But I disagree with Linus. I think it would be perfectly reasonable to enable KAISER and to use a tool like pin on a legacy binary from some enterprise distribution. I bet there are lots of enterprise distributions that are still supported that use vsyscalls.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/10/2017 02:06 PM, Andy Lutomirski wrote: > On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen >wrote: >> On 11/09/2017 06:25 PM, Andy Lutomirski wrote: >>> Here are two proposals to address this without breaking vsyscalls. >>> >>> 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high >>> mappings but, optionally, warn if you see _PAGE_USER on any address >>> that isn't the vsyscall page. >>> >>> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so >>> KAISER doesn't muck with it. >> >> These are totally doable. But, what's the big deal with breaking native >> vsyscall? We can still do the emulation so nothing breaks: it is just slow. > > I have nothing against disabling native. I object to breaking the > weird binary tracing behavior in the emulation mode, especially if > it's tangled up with KAISER. I got all kinds of flak in an earlier > version of the vsyscall emulation patches when I broke that use case. > KAISER may get very widely backported -- let's not make changes that > are already known to break things. Is the thing that broke a "user mode program that actually looks at the vsyscall page"? Like Linus is referring to here: > http://lkml.kernel.org/r/ca+55afyijhb4wndmkgexektzhyt8pajqsau2peo3o4ekizb...@mail.gmail.com
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/10/2017 02:06 PM, Andy Lutomirski wrote: > On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen > wrote: >> On 11/09/2017 06:25 PM, Andy Lutomirski wrote: >>> Here are two proposals to address this without breaking vsyscalls. >>> >>> 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high >>> mappings but, optionally, warn if you see _PAGE_USER on any address >>> that isn't the vsyscall page. >>> >>> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so >>> KAISER doesn't muck with it. >> >> These are totally doable. But, what's the big deal with breaking native >> vsyscall? We can still do the emulation so nothing breaks: it is just slow. > > I have nothing against disabling native. I object to breaking the > weird binary tracing behavior in the emulation mode, especially if > it's tangled up with KAISER. I got all kinds of flak in an earlier > version of the vsyscall emulation patches when I broke that use case. > KAISER may get very widely backported -- let's not make changes that > are already known to break things. Is the thing that broke a "user mode program that actually looks at the vsyscall page"? Like Linus is referring to here: > http://lkml.kernel.org/r/ca+55afyijhb4wndmkgexektzhyt8pajqsau2peo3o4ekizb...@mail.gmail.com
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansenwrote: > On 11/09/2017 06:25 PM, Andy Lutomirski wrote: >> Here are two proposals to address this without breaking vsyscalls. >> >> 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high >> mappings but, optionally, warn if you see _PAGE_USER on any address >> that isn't the vsyscall page. >> >> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so >> KAISER doesn't muck with it. > > These are totally doable. But, what's the big deal with breaking native > vsyscall? We can still do the emulation so nothing breaks: it is just slow. I have nothing against disabling native. I object to breaking the weird binary tracing behavior in the emulation mode, especially if it's tangled up with KAISER. I got all kinds of flak in an earlier version of the vsyscall emulation patches when I broke that use case. KAISER may get very widely backported -- let's not make changes that are already known to break things.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen wrote: > On 11/09/2017 06:25 PM, Andy Lutomirski wrote: >> Here are two proposals to address this without breaking vsyscalls. >> >> 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high >> mappings but, optionally, warn if you see _PAGE_USER on any address >> that isn't the vsyscall page. >> >> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so >> KAISER doesn't muck with it. > > These are totally doable. But, what's the big deal with breaking native > vsyscall? We can still do the emulation so nothing breaks: it is just slow. I have nothing against disabling native. I object to breaking the weird binary tracing behavior in the emulation mode, especially if it's tangled up with KAISER. I got all kinds of flak in an earlier version of the vsyscall emulation patches when I broke that use case. KAISER may get very widely backported -- let's not make changes that are already known to break things.
[PATCH 24/30] x86, kaiser: disable native VSYSCALL
From: Dave HansenThe KAISER code attempts to "poison" the user portion of the kernel page tables. It detects entries that it wants that it wants to poison in two ways: * Looking for addresses >= PAGE_OFFSET * Looking for entries without _PAGE_USER set But, to allow the _PAGE_USER check to work, it must never be set on init_mm entries, and an earlier patch in this series ensured that it will never be set. The VDSO is at a address >= PAGE_OFFSET and it is also mapped by init_mm. Because of the earlier, KAISER-enforced restriction, _PAGE_USER is never set which makes the VDSO unreadable to userspace. This makes the "NATIVE" case totally unusable since userspace can not even see the memory any more. Disable it whenever KAISER is enabled. Also add some help text about how KAISER might affect the emulation case as well. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Richard Fellner Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org --- b/arch/x86/Kconfig |8 1 file changed, 8 insertions(+) diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig --- a/arch/x86/Kconfig~kaiser-no-vsyscall 2017-11-10 11:22:18.366244926 -0800 +++ b/arch/x86/Kconfig 2017-11-10 11:22:18.370244926 -0800 @@ -2231,6 +2231,9 @@ choice config LEGACY_VSYSCALL_NATIVE bool "Native" + # The VSYSCALL page comes from the kernel page tables + # and is not available when KAISER is enabled. + depends on ! KAISER help Actual executable code is located in the fixed vsyscall address mapping, implementing time() efficiently. Since @@ -2248,6 +2251,11 @@ choice exploits. This configuration is recommended when userspace still uses the vsyscall area. + When KAISER is enabled, the vsyscall area will become + unreadable. This emulation option still works, but KAISER + will make it harder to do things like trace code using the + emulation. + config LEGACY_VSYSCALL_NONE bool "None" help _
[PATCH 24/30] x86, kaiser: disable native VSYSCALL
From: Dave Hansen The KAISER code attempts to "poison" the user portion of the kernel page tables. It detects entries that it wants that it wants to poison in two ways: * Looking for addresses >= PAGE_OFFSET * Looking for entries without _PAGE_USER set But, to allow the _PAGE_USER check to work, it must never be set on init_mm entries, and an earlier patch in this series ensured that it will never be set. The VDSO is at a address >= PAGE_OFFSET and it is also mapped by init_mm. Because of the earlier, KAISER-enforced restriction, _PAGE_USER is never set which makes the VDSO unreadable to userspace. This makes the "NATIVE" case totally unusable since userspace can not even see the memory any more. Disable it whenever KAISER is enabled. Also add some help text about how KAISER might affect the emulation case as well. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Richard Fellner Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org --- b/arch/x86/Kconfig |8 1 file changed, 8 insertions(+) diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig --- a/arch/x86/Kconfig~kaiser-no-vsyscall 2017-11-10 11:22:18.366244926 -0800 +++ b/arch/x86/Kconfig 2017-11-10 11:22:18.370244926 -0800 @@ -2231,6 +2231,9 @@ choice config LEGACY_VSYSCALL_NATIVE bool "Native" + # The VSYSCALL page comes from the kernel page tables + # and is not available when KAISER is enabled. + depends on ! KAISER help Actual executable code is located in the fixed vsyscall address mapping, implementing time() efficiently. Since @@ -2248,6 +2251,11 @@ choice exploits. This configuration is recommended when userspace still uses the vsyscall area. + When KAISER is enabled, the vsyscall area will become + unreadable. This emulation option still works, but KAISER + will make it harder to do things like trace code using the + emulation. + config LEGACY_VSYSCALL_NONE bool "None" help _
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/09/2017 06:25 PM, Andy Lutomirski wrote: > Here are two proposals to address this without breaking vsyscalls. > > 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high > mappings but, optionally, warn if you see _PAGE_USER on any address > that isn't the vsyscall page. > > 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so > KAISER doesn't muck with it. These are totally doable. But, what's the big deal with breaking native vsyscall? We can still do the emulation so nothing breaks: it is just slow.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/09/2017 06:25 PM, Andy Lutomirski wrote: > Here are two proposals to address this without breaking vsyscalls. > > 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high > mappings but, optionally, warn if you see _PAGE_USER on any address > that isn't the vsyscall page. > > 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so > KAISER doesn't muck with it. These are totally doable. But, what's the big deal with breaking native vsyscall? We can still do the emulation so nothing breaks: it is just slow.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Thu, Nov 9, 2017 at 5:22 PM, Dave Hansenwrote: > On 11/09/2017 05:04 PM, Andy Lutomirski wrote: >> On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen >> wrote: >>> On 11/09/2017 04:53 PM, Andy Lutomirski wrote: > The KAISER code attempts to "poison" the user portion of the kernel page > tables. It detects the entries pages that it wants that it wants to > poison in two ways: > * Looking for addresses >= PAGE_OFFSET > * Looking for entries without _PAGE_USER set What do you mean "poison"? >>> >>> I meant the _PAGE_NX magic that we do in here: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a >>> >>> to ensure that userspace is unable to run on the kernel PGD. >> >> Aha, I get it. Why not just drop the _PAGE_USER check? You could >> instead warn if you see a _PAGE_USER page that doesn't have the >> correct address for the vsyscall. > > The _PAGE_USER check helps us with kernel things that want to create > mappings below PAGE_OFFSET. The EFI code was the prime user for this. > Without this, we poison the EFI mappings and the EFI calls die. OK, let's see if I understand. EFI and maybe some other stuff creates low mappings with _PAGE_USER clear that are intended to be executed in kernel mode, and, if you just set NX on all low mappings in kernel mode, then it doesn't work. Here are two proposals to address this without breaking vsyscalls. 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high mappings but, optionally, warn if you see _PAGE_USER on any address that isn't the vsyscall page. 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so KAISER doesn't muck with it. --Andy
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Thu, Nov 9, 2017 at 5:22 PM, Dave Hansen wrote: > On 11/09/2017 05:04 PM, Andy Lutomirski wrote: >> On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen >> wrote: >>> On 11/09/2017 04:53 PM, Andy Lutomirski wrote: > The KAISER code attempts to "poison" the user portion of the kernel page > tables. It detects the entries pages that it wants that it wants to > poison in two ways: > * Looking for addresses >= PAGE_OFFSET > * Looking for entries without _PAGE_USER set What do you mean "poison"? >>> >>> I meant the _PAGE_NX magic that we do in here: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a >>> >>> to ensure that userspace is unable to run on the kernel PGD. >> >> Aha, I get it. Why not just drop the _PAGE_USER check? You could >> instead warn if you see a _PAGE_USER page that doesn't have the >> correct address for the vsyscall. > > The _PAGE_USER check helps us with kernel things that want to create > mappings below PAGE_OFFSET. The EFI code was the prime user for this. > Without this, we poison the EFI mappings and the EFI calls die. OK, let's see if I understand. EFI and maybe some other stuff creates low mappings with _PAGE_USER clear that are intended to be executed in kernel mode, and, if you just set NX on all low mappings in kernel mode, then it doesn't work. Here are two proposals to address this without breaking vsyscalls. 1. Set NX on low mappings that are _PAGE_USER. Don't set NX on high mappings but, optionally, warn if you see _PAGE_USER on any address that isn't the vsyscall page. 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so KAISER doesn't muck with it. --Andy
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/09/2017 05:04 PM, Andy Lutomirski wrote: > On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen> wrote: >> On 11/09/2017 04:53 PM, Andy Lutomirski wrote: The KAISER code attempts to "poison" the user portion of the kernel page tables. It detects the entries pages that it wants that it wants to poison in two ways: * Looking for addresses >= PAGE_OFFSET * Looking for entries without _PAGE_USER set >>> What do you mean "poison"? >> >> I meant the _PAGE_NX magic that we do in here: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a >> >> to ensure that userspace is unable to run on the kernel PGD. > > Aha, I get it. Why not just drop the _PAGE_USER check? You could > instead warn if you see a _PAGE_USER page that doesn't have the > correct address for the vsyscall. The _PAGE_USER check helps us with kernel things that want to create mappings below PAGE_OFFSET. The EFI code was the prime user for this. Without this, we poison the EFI mappings and the EFI calls die. I think there might have also been a case for the secondary CPU bringup that needed hacking if we didn't do this.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/09/2017 05:04 PM, Andy Lutomirski wrote: > On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen > wrote: >> On 11/09/2017 04:53 PM, Andy Lutomirski wrote: The KAISER code attempts to "poison" the user portion of the kernel page tables. It detects the entries pages that it wants that it wants to poison in two ways: * Looking for addresses >= PAGE_OFFSET * Looking for entries without _PAGE_USER set >>> What do you mean "poison"? >> >> I meant the _PAGE_NX magic that we do in here: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a >> >> to ensure that userspace is unable to run on the kernel PGD. > > Aha, I get it. Why not just drop the _PAGE_USER check? You could > instead warn if you see a _PAGE_USER page that doesn't have the > correct address for the vsyscall. The _PAGE_USER check helps us with kernel things that want to create mappings below PAGE_OFFSET. The EFI code was the prime user for this. Without this, we poison the EFI mappings and the EFI calls die. I think there might have also been a case for the secondary CPU bringup that needed hacking if we didn't do this.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansenwrote: > On 11/09/2017 04:53 PM, Andy Lutomirski wrote: >>> The KAISER code attempts to "poison" the user portion of the kernel page >>> tables. It detects the entries pages that it wants that it wants to >>> poison in two ways: >>> * Looking for addresses >= PAGE_OFFSET >>> * Looking for entries without _PAGE_USER set >> What do you mean "poison"? > > I meant the _PAGE_NX magic that we do in here: > > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a > > to ensure that userspace is unable to run on the kernel PGD. Aha, I get it. Why not just drop the _PAGE_USER check? You could instead warn if you see a _PAGE_USER page that doesn't have the correct address for the vsyscall.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen wrote: > On 11/09/2017 04:53 PM, Andy Lutomirski wrote: >>> The KAISER code attempts to "poison" the user portion of the kernel page >>> tables. It detects the entries pages that it wants that it wants to >>> poison in two ways: >>> * Looking for addresses >= PAGE_OFFSET >>> * Looking for entries without _PAGE_USER set >> What do you mean "poison"? > > I meant the _PAGE_NX magic that we do in here: > > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a > > to ensure that userspace is unable to run on the kernel PGD. Aha, I get it. Why not just drop the _PAGE_USER check? You could instead warn if you see a _PAGE_USER page that doesn't have the correct address for the vsyscall.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/09/2017 04:53 PM, Andy Lutomirski wrote: >> The KAISER code attempts to "poison" the user portion of the kernel page >> tables. It detects the entries pages that it wants that it wants to >> poison in two ways: >> * Looking for addresses >= PAGE_OFFSET >> * Looking for entries without _PAGE_USER set > What do you mean "poison"? I meant the _PAGE_NX magic that we do in here: https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a to ensure that userspace is unable to run on the kernel PGD.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/09/2017 04:53 PM, Andy Lutomirski wrote: >> The KAISER code attempts to "poison" the user portion of the kernel page >> tables. It detects the entries pages that it wants that it wants to >> poison in two ways: >> * Looking for addresses >= PAGE_OFFSET >> * Looking for entries without _PAGE_USER set > What do you mean "poison"? I meant the _PAGE_NX magic that we do in here: https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a to ensure that userspace is unable to run on the kernel PGD.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Thu, Nov 9, 2017 at 11:26 AM, Dave Hansenwrote: > On 11/09/2017 11:04 AM, Andy Lutomirski wrote: >> On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen >> wrote: >>> >>> From: Dave Hansen >>> >>> The VSYSCALL page is mapped by kernel page tables at a kernel address. >>> It is troublesome to support with KAISER in place, so disable the >>> native case. >>> >>> Also add some help text about how KAISER might affect the emulation >>> case as well. >> >> Can you re-explain why this is helpful? > > How about this? > > The KAISER code attempts to "poison" the user portion of the kernel page > tables. It detects the entries pages that it wants that it wants to > poison in two ways: > * Looking for addresses >= PAGE_OFFSET > * Looking for entries without _PAGE_USER set What do you mean "poison"? Anyway, the stuff here: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_stack is an attempt to create the infrastructure needed to move (almost?) everything needed in the user tables into the fixmap. If that ends up working well, then perhaps the fixmap should just be completely special-cased, in which case I think this issue goes away. What I have in mind is something like: set_user_fixmap(index, pa, prot); that sets an entry in the *user* fixmap. All user mms would get the same PGD entry for the user fixmap. (And yes, it quite correctly fails kbuild bot right now. That's why I haven't emailed out the patches yet.)
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Thu, Nov 9, 2017 at 11:26 AM, Dave Hansen wrote: > On 11/09/2017 11:04 AM, Andy Lutomirski wrote: >> On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen >> wrote: >>> >>> From: Dave Hansen >>> >>> The VSYSCALL page is mapped by kernel page tables at a kernel address. >>> It is troublesome to support with KAISER in place, so disable the >>> native case. >>> >>> Also add some help text about how KAISER might affect the emulation >>> case as well. >> >> Can you re-explain why this is helpful? > > How about this? > > The KAISER code attempts to "poison" the user portion of the kernel page > tables. It detects the entries pages that it wants that it wants to > poison in two ways: > * Looking for addresses >= PAGE_OFFSET > * Looking for entries without _PAGE_USER set What do you mean "poison"? Anyway, the stuff here: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_stack is an attempt to create the infrastructure needed to move (almost?) everything needed in the user tables into the fixmap. If that ends up working well, then perhaps the fixmap should just be completely special-cased, in which case I think this issue goes away. What I have in mind is something like: set_user_fixmap(index, pa, prot); that sets an entry in the *user* fixmap. All user mms would get the same PGD entry for the user fixmap. (And yes, it quite correctly fails kbuild bot right now. That's why I haven't emailed out the patches yet.)
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/09/2017 11:04 AM, Andy Lutomirski wrote: > On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen >wrote: >> >> From: Dave Hansen >> >> The VSYSCALL page is mapped by kernel page tables at a kernel address. >> It is troublesome to support with KAISER in place, so disable the >> native case. >> >> Also add some help text about how KAISER might affect the emulation >> case as well. > > Can you re-explain why this is helpful? How about this? The KAISER code attempts to "poison" the user portion of the kernel page tables. It detects the entries pages that it wants that it wants to poison in two ways: * Looking for addresses >= PAGE_OFFSET * Looking for entries without _PAGE_USER set But, to allow the _PAGE_USER check to work, we stopped it from being set on all init_mm entries. The VDSO is at a address >= PAGE_OFFSET and it is also mapped by the init_mm. The fact that we remove _PAGE_USER from the page tables makes it unreadable to userspace. This makes the "NATIVE" case totally unusable since userspace can not even see the memory any more. Disable it whenever KAISER is enabled.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On 11/09/2017 11:04 AM, Andy Lutomirski wrote: > On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen > wrote: >> >> From: Dave Hansen >> >> The VSYSCALL page is mapped by kernel page tables at a kernel address. >> It is troublesome to support with KAISER in place, so disable the >> native case. >> >> Also add some help text about how KAISER might affect the emulation >> case as well. > > Can you re-explain why this is helpful? How about this? The KAISER code attempts to "poison" the user portion of the kernel page tables. It detects the entries pages that it wants that it wants to poison in two ways: * Looking for addresses >= PAGE_OFFSET * Looking for entries without _PAGE_USER set But, to allow the _PAGE_USER check to work, we stopped it from being set on all init_mm entries. The VDSO is at a address >= PAGE_OFFSET and it is also mapped by the init_mm. The fact that we remove _PAGE_USER from the page tables makes it unreadable to userspace. This makes the "NATIVE" case totally unusable since userspace can not even see the memory any more. Disable it whenever KAISER is enabled.
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansenwrote: > > From: Dave Hansen > > The VSYSCALL page is mapped by kernel page tables at a kernel address. > It is troublesome to support with KAISER in place, so disable the > native case. > > Also add some help text about how KAISER might affect the emulation > case as well. Can you re-explain why this is helpful? Also, I'm about to send patches that may cause a rethinking of how KAISER handles the fixmap. --Andy > > Signed-off-by: Dave Hansen > Cc: Moritz Lipp > Cc: Daniel Gruss > Cc: Michael Schwarz > Cc: Richard Fellner > Cc: Andy Lutomirski > Cc: Linus Torvalds > Cc: Kees Cook > Cc: Hugh Dickins > Cc: x...@kernel.org > > --- > > b/arch/x86/Kconfig |8 > 1 file changed, 8 insertions(+) > > diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig > --- a/arch/x86/Kconfig~kaiser-no-vsyscall 2017-11-08 10:45:39.157681370 > -0800 > +++ b/arch/x86/Kconfig 2017-11-08 10:45:39.162681370 -0800 > @@ -2231,6 +2231,9 @@ choice > > config LEGACY_VSYSCALL_NATIVE > bool "Native" > + # The VSYSCALL page comes from the kernel page tables > + # and is not available when KAISER is enabled. > + depends on ! KAISER > help > Actual executable code is located in the fixed vsyscall > address mapping, implementing time() efficiently. Since > @@ -2248,6 +2251,11 @@ choice > exploits. This configuration is recommended when userspace > still uses the vsyscall area. > > + When KAISER is enabled, the vsyscall area will become > + unreadable. This emulation option still works, but KAISER > + will make it harder to do things like trace code using the > + emulation. > + > config LEGACY_VSYSCALL_NONE > bool "None" > help > _
Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL
On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen wrote: > > From: Dave Hansen > > The VSYSCALL page is mapped by kernel page tables at a kernel address. > It is troublesome to support with KAISER in place, so disable the > native case. > > Also add some help text about how KAISER might affect the emulation > case as well. Can you re-explain why this is helpful? Also, I'm about to send patches that may cause a rethinking of how KAISER handles the fixmap. --Andy > > Signed-off-by: Dave Hansen > Cc: Moritz Lipp > Cc: Daniel Gruss > Cc: Michael Schwarz > Cc: Richard Fellner > Cc: Andy Lutomirski > Cc: Linus Torvalds > Cc: Kees Cook > Cc: Hugh Dickins > Cc: x...@kernel.org > > --- > > b/arch/x86/Kconfig |8 > 1 file changed, 8 insertions(+) > > diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig > --- a/arch/x86/Kconfig~kaiser-no-vsyscall 2017-11-08 10:45:39.157681370 > -0800 > +++ b/arch/x86/Kconfig 2017-11-08 10:45:39.162681370 -0800 > @@ -2231,6 +2231,9 @@ choice > > config LEGACY_VSYSCALL_NATIVE > bool "Native" > + # The VSYSCALL page comes from the kernel page tables > + # and is not available when KAISER is enabled. > + depends on ! KAISER > help > Actual executable code is located in the fixed vsyscall > address mapping, implementing time() efficiently. Since > @@ -2248,6 +2251,11 @@ choice > exploits. This configuration is recommended when userspace > still uses the vsyscall area. > > + When KAISER is enabled, the vsyscall area will become > + unreadable. This emulation option still works, but KAISER > + will make it harder to do things like trace code using the > + emulation. > + > config LEGACY_VSYSCALL_NONE > bool "None" > help > _
[PATCH 24/30] x86, kaiser: disable native VSYSCALL
From: Dave HansenThe VSYSCALL page is mapped by kernel page tables at a kernel address. It is troublesome to support with KAISER in place, so disable the native case. Also add some help text about how KAISER might affect the emulation case as well. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Richard Fellner Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org --- b/arch/x86/Kconfig |8 1 file changed, 8 insertions(+) diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig --- a/arch/x86/Kconfig~kaiser-no-vsyscall 2017-11-08 10:45:39.157681370 -0800 +++ b/arch/x86/Kconfig 2017-11-08 10:45:39.162681370 -0800 @@ -2231,6 +2231,9 @@ choice config LEGACY_VSYSCALL_NATIVE bool "Native" + # The VSYSCALL page comes from the kernel page tables + # and is not available when KAISER is enabled. + depends on ! KAISER help Actual executable code is located in the fixed vsyscall address mapping, implementing time() efficiently. Since @@ -2248,6 +2251,11 @@ choice exploits. This configuration is recommended when userspace still uses the vsyscall area. + When KAISER is enabled, the vsyscall area will become + unreadable. This emulation option still works, but KAISER + will make it harder to do things like trace code using the + emulation. + config LEGACY_VSYSCALL_NONE bool "None" help _
[PATCH 24/30] x86, kaiser: disable native VSYSCALL
From: Dave Hansen The VSYSCALL page is mapped by kernel page tables at a kernel address. It is troublesome to support with KAISER in place, so disable the native case. Also add some help text about how KAISER might affect the emulation case as well. Signed-off-by: Dave Hansen Cc: Moritz Lipp Cc: Daniel Gruss Cc: Michael Schwarz Cc: Richard Fellner Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Kees Cook Cc: Hugh Dickins Cc: x...@kernel.org --- b/arch/x86/Kconfig |8 1 file changed, 8 insertions(+) diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig --- a/arch/x86/Kconfig~kaiser-no-vsyscall 2017-11-08 10:45:39.157681370 -0800 +++ b/arch/x86/Kconfig 2017-11-08 10:45:39.162681370 -0800 @@ -2231,6 +2231,9 @@ choice config LEGACY_VSYSCALL_NATIVE bool "Native" + # The VSYSCALL page comes from the kernel page tables + # and is not available when KAISER is enabled. + depends on ! KAISER help Actual executable code is located in the fixed vsyscall address mapping, implementing time() efficiently. Since @@ -2248,6 +2251,11 @@ choice exploits. This configuration is recommended when userspace still uses the vsyscall area. + When KAISER is enabled, the vsyscall area will become + unreadable. This emulation option still works, but KAISER + will make it harder to do things like trace code using the + emulation. + config LEGACY_VSYSCALL_NONE bool "None" help _