Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-13 Thread Andy Lutomirski
On Mon, Nov 13, 2017 at 1:07 PM, Dave Hansen
 wrote:
> On 11/12/2017 07:52 PM, Andy Lutomirski wrote:
>> On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen
>>  wrote:
>>> On 11/10/2017 02:06 PM, Andy Lutomirski wrote:
 I have nothing against disabling native.  I object to breaking the
 weird binary tracing behavior in the emulation mode, especially if
 it's tangled up with KAISER.  I got all kinds of flak in an earlier
 version of the vsyscall emulation patches when I broke that use case.
 KAISER may get very widely backported -- let's not make changes that
 are already known to break things.
>>>
>>> Is the thing that broke a "user mode program that actually looks at the
>>> vsyscall page"?  Like Linus is referring to here:
>>>
>> Yes.  But I disagree with Linus.  I think it would be perfectly
>> reasonable to enable KAISER and to use a tool like pin on a legacy
>> binary from some enterprise distribution.  I bet there are lots of
>> enterprise distributions that are still supported that use vsyscalls.
>
> All we need to do in the end here is to re-set _PAGE_USER on the user
> page table PGD that is used by the vsyscall page.  We should be able to
> do that with a line or two of code in kaiser_init().  We can do it
> conditionally on when the VDSO is not compile-time disabled.
>
> I can do this as a follow-on patch, or as the last one in the KAISER
> series and leave it up to our esteemed maintainers to decide whether
> they want to do it or not.  Sound good?
>
> Are there any userspace tests around that I can use for this, or will I
> have to cook something up?

I don't.  This old test might be adaptable:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/misc-tests.git/tree/test_vsyscall.cc

What you'd want to do is to add a variant that allocates some RWX
memory, memcpys the vsyscall page there, and tests that it still works
(but only if the vsyscall page worked in the first place).


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-13 Thread Andy Lutomirski
On Mon, Nov 13, 2017 at 1:07 PM, Dave Hansen
 wrote:
> On 11/12/2017 07:52 PM, Andy Lutomirski wrote:
>> On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen
>>  wrote:
>>> On 11/10/2017 02:06 PM, Andy Lutomirski wrote:
 I have nothing against disabling native.  I object to breaking the
 weird binary tracing behavior in the emulation mode, especially if
 it's tangled up with KAISER.  I got all kinds of flak in an earlier
 version of the vsyscall emulation patches when I broke that use case.
 KAISER may get very widely backported -- let's not make changes that
 are already known to break things.
>>>
>>> Is the thing that broke a "user mode program that actually looks at the
>>> vsyscall page"?  Like Linus is referring to here:
>>>
>> Yes.  But I disagree with Linus.  I think it would be perfectly
>> reasonable to enable KAISER and to use a tool like pin on a legacy
>> binary from some enterprise distribution.  I bet there are lots of
>> enterprise distributions that are still supported that use vsyscalls.
>
> All we need to do in the end here is to re-set _PAGE_USER on the user
> page table PGD that is used by the vsyscall page.  We should be able to
> do that with a line or two of code in kaiser_init().  We can do it
> conditionally on when the VDSO is not compile-time disabled.
>
> I can do this as a follow-on patch, or as the last one in the KAISER
> series and leave it up to our esteemed maintainers to decide whether
> they want to do it or not.  Sound good?
>
> Are there any userspace tests around that I can use for this, or will I
> have to cook something up?

I don't.  This old test might be adaptable:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/misc-tests.git/tree/test_vsyscall.cc

What you'd want to do is to add a variant that allocates some RWX
memory, memcpys the vsyscall page there, and tests that it still works
(but only if the vsyscall page worked in the first place).


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-13 Thread Dave Hansen
On 11/12/2017 07:52 PM, Andy Lutomirski wrote:
> On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen
>  wrote:
>> On 11/10/2017 02:06 PM, Andy Lutomirski wrote:
>>> I have nothing against disabling native.  I object to breaking the
>>> weird binary tracing behavior in the emulation mode, especially if
>>> it's tangled up with KAISER.  I got all kinds of flak in an earlier
>>> version of the vsyscall emulation patches when I broke that use case.
>>> KAISER may get very widely backported -- let's not make changes that
>>> are already known to break things.
>>
>> Is the thing that broke a "user mode program that actually looks at the
>> vsyscall page"?  Like Linus is referring to here:
>>
> Yes.  But I disagree with Linus.  I think it would be perfectly
> reasonable to enable KAISER and to use a tool like pin on a legacy
> binary from some enterprise distribution.  I bet there are lots of
> enterprise distributions that are still supported that use vsyscalls.

All we need to do in the end here is to re-set _PAGE_USER on the user
page table PGD that is used by the vsyscall page.  We should be able to
do that with a line or two of code in kaiser_init().  We can do it
conditionally on when the VDSO is not compile-time disabled.

I can do this as a follow-on patch, or as the last one in the KAISER
series and leave it up to our esteemed maintainers to decide whether
they want to do it or not.  Sound good?

Are there any userspace tests around that I can use for this, or will I
have to cook something up?


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-13 Thread Dave Hansen
On 11/12/2017 07:52 PM, Andy Lutomirski wrote:
> On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen
>  wrote:
>> On 11/10/2017 02:06 PM, Andy Lutomirski wrote:
>>> I have nothing against disabling native.  I object to breaking the
>>> weird binary tracing behavior in the emulation mode, especially if
>>> it's tangled up with KAISER.  I got all kinds of flak in an earlier
>>> version of the vsyscall emulation patches when I broke that use case.
>>> KAISER may get very widely backported -- let's not make changes that
>>> are already known to break things.
>>
>> Is the thing that broke a "user mode program that actually looks at the
>> vsyscall page"?  Like Linus is referring to here:
>>
> Yes.  But I disagree with Linus.  I think it would be perfectly
> reasonable to enable KAISER and to use a tool like pin on a legacy
> binary from some enterprise distribution.  I bet there are lots of
> enterprise distributions that are still supported that use vsyscalls.

All we need to do in the end here is to re-set _PAGE_USER on the user
page table PGD that is used by the vsyscall page.  We should be able to
do that with a line or two of code in kaiser_init().  We can do it
conditionally on when the VDSO is not compile-time disabled.

I can do this as a follow-on patch, or as the last one in the KAISER
series and leave it up to our esteemed maintainers to decide whether
they want to do it or not.  Sound good?

Are there any userspace tests around that I can use for this, or will I
have to cook something up?


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-12 Thread Andy Lutomirski
On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen
 wrote:
> On 11/10/2017 02:06 PM, Andy Lutomirski wrote:
>> On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen
>>  wrote:
>>> On 11/09/2017 06:25 PM, Andy Lutomirski wrote:
 Here are two proposals to address this without breaking vsyscalls.

 1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
 mappings but, optionally, warn if you see _PAGE_USER on any address
 that isn't the vsyscall page.

 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
 KAISER doesn't muck with it.
>>>
>>> These are totally doable.  But, what's the big deal with breaking native
>>> vsyscall?  We can still do the emulation so nothing breaks: it is just slow.
>>
>> I have nothing against disabling native.  I object to breaking the
>> weird binary tracing behavior in the emulation mode, especially if
>> it's tangled up with KAISER.  I got all kinds of flak in an earlier
>> version of the vsyscall emulation patches when I broke that use case.
>> KAISER may get very widely backported -- let's not make changes that
>> are already known to break things.
>
> Is the thing that broke a "user mode program that actually looks at the
> vsyscall page"?  Like Linus is referring to here:
>

Yes.  But I disagree with Linus.  I think it would be perfectly
reasonable to enable KAISER and to use a tool like pin on a legacy
binary from some enterprise distribution.  I bet there are lots of
enterprise distributions that are still supported that use vsyscalls.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-12 Thread Andy Lutomirski
On Fri, Nov 10, 2017 at 3:04 PM, Dave Hansen
 wrote:
> On 11/10/2017 02:06 PM, Andy Lutomirski wrote:
>> On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen
>>  wrote:
>>> On 11/09/2017 06:25 PM, Andy Lutomirski wrote:
 Here are two proposals to address this without breaking vsyscalls.

 1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
 mappings but, optionally, warn if you see _PAGE_USER on any address
 that isn't the vsyscall page.

 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
 KAISER doesn't muck with it.
>>>
>>> These are totally doable.  But, what's the big deal with breaking native
>>> vsyscall?  We can still do the emulation so nothing breaks: it is just slow.
>>
>> I have nothing against disabling native.  I object to breaking the
>> weird binary tracing behavior in the emulation mode, especially if
>> it's tangled up with KAISER.  I got all kinds of flak in an earlier
>> version of the vsyscall emulation patches when I broke that use case.
>> KAISER may get very widely backported -- let's not make changes that
>> are already known to break things.
>
> Is the thing that broke a "user mode program that actually looks at the
> vsyscall page"?  Like Linus is referring to here:
>

Yes.  But I disagree with Linus.  I think it would be perfectly
reasonable to enable KAISER and to use a tool like pin on a legacy
binary from some enterprise distribution.  I bet there are lots of
enterprise distributions that are still supported that use vsyscalls.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-10 Thread Dave Hansen
On 11/10/2017 02:06 PM, Andy Lutomirski wrote:
> On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen
>  wrote:
>> On 11/09/2017 06:25 PM, Andy Lutomirski wrote:
>>> Here are two proposals to address this without breaking vsyscalls.
>>>
>>> 1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
>>> mappings but, optionally, warn if you see _PAGE_USER on any address
>>> that isn't the vsyscall page.
>>>
>>> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
>>> KAISER doesn't muck with it.
>>
>> These are totally doable.  But, what's the big deal with breaking native
>> vsyscall?  We can still do the emulation so nothing breaks: it is just slow.
> 
> I have nothing against disabling native.  I object to breaking the
> weird binary tracing behavior in the emulation mode, especially if
> it's tangled up with KAISER.  I got all kinds of flak in an earlier
> version of the vsyscall emulation patches when I broke that use case.
> KAISER may get very widely backported -- let's not make changes that
> are already known to break things.

Is the thing that broke a "user mode program that actually looks at the
vsyscall page"?  Like Linus is referring to here:

> http://lkml.kernel.org/r/ca+55afyijhb4wndmkgexektzhyt8pajqsau2peo3o4ekizb...@mail.gmail.com


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-10 Thread Dave Hansen
On 11/10/2017 02:06 PM, Andy Lutomirski wrote:
> On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen
>  wrote:
>> On 11/09/2017 06:25 PM, Andy Lutomirski wrote:
>>> Here are two proposals to address this without breaking vsyscalls.
>>>
>>> 1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
>>> mappings but, optionally, warn if you see _PAGE_USER on any address
>>> that isn't the vsyscall page.
>>>
>>> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
>>> KAISER doesn't muck with it.
>>
>> These are totally doable.  But, what's the big deal with breaking native
>> vsyscall?  We can still do the emulation so nothing breaks: it is just slow.
> 
> I have nothing against disabling native.  I object to breaking the
> weird binary tracing behavior in the emulation mode, especially if
> it's tangled up with KAISER.  I got all kinds of flak in an earlier
> version of the vsyscall emulation patches when I broke that use case.
> KAISER may get very widely backported -- let's not make changes that
> are already known to break things.

Is the thing that broke a "user mode program that actually looks at the
vsyscall page"?  Like Linus is referring to here:

> http://lkml.kernel.org/r/ca+55afyijhb4wndmkgexektzhyt8pajqsau2peo3o4ekizb...@mail.gmail.com


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-10 Thread Andy Lutomirski
On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen
 wrote:
> On 11/09/2017 06:25 PM, Andy Lutomirski wrote:
>> Here are two proposals to address this without breaking vsyscalls.
>>
>> 1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
>> mappings but, optionally, warn if you see _PAGE_USER on any address
>> that isn't the vsyscall page.
>>
>> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
>> KAISER doesn't muck with it.
>
> These are totally doable.  But, what's the big deal with breaking native
> vsyscall?  We can still do the emulation so nothing breaks: it is just slow.

I have nothing against disabling native.  I object to breaking the
weird binary tracing behavior in the emulation mode, especially if
it's tangled up with KAISER.  I got all kinds of flak in an earlier
version of the vsyscall emulation patches when I broke that use case.
KAISER may get very widely backported -- let's not make changes that
are already known to break things.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-10 Thread Andy Lutomirski
On Thu, Nov 9, 2017 at 10:31 PM, Dave Hansen
 wrote:
> On 11/09/2017 06:25 PM, Andy Lutomirski wrote:
>> Here are two proposals to address this without breaking vsyscalls.
>>
>> 1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
>> mappings but, optionally, warn if you see _PAGE_USER on any address
>> that isn't the vsyscall page.
>>
>> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
>> KAISER doesn't muck with it.
>
> These are totally doable.  But, what's the big deal with breaking native
> vsyscall?  We can still do the emulation so nothing breaks: it is just slow.

I have nothing against disabling native.  I object to breaking the
weird binary tracing behavior in the emulation mode, especially if
it's tangled up with KAISER.  I got all kinds of flak in an earlier
version of the vsyscall emulation patches when I broke that use case.
KAISER may get very widely backported -- let's not make changes that
are already known to break things.


[PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-10 Thread Dave Hansen

From: Dave Hansen 

The KAISER code attempts to "poison" the user portion of the kernel page
tables.  It detects entries that it wants that it wants to poison in two
ways:
 * Looking for addresses >= PAGE_OFFSET
 * Looking for entries without _PAGE_USER set

But, to allow the _PAGE_USER check to work, it must never be set on
init_mm entries, and an earlier patch in this series ensured that it
will never be set.

The VDSO is at a address >= PAGE_OFFSET and it is also mapped by init_mm.
Because of the earlier, KAISER-enforced restriction, _PAGE_USER is never
set which makes the VDSO unreadable to userspace.

This makes the "NATIVE" case totally unusable since userspace can not
even see the memory any more.  Disable it whenever KAISER is enabled.

Also add some help text about how KAISER might affect the emulation
case as well.

Signed-off-by: Dave Hansen 
Cc: Moritz Lipp 
Cc: Daniel Gruss 
Cc: Michael Schwarz 
Cc: Richard Fellner 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Kees Cook 
Cc: Hugh Dickins 
Cc: x...@kernel.org

---

 b/arch/x86/Kconfig |8 
 1 file changed, 8 insertions(+)

diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig
--- a/arch/x86/Kconfig~kaiser-no-vsyscall   2017-11-10 11:22:18.366244926 
-0800
+++ b/arch/x86/Kconfig  2017-11-10 11:22:18.370244926 -0800
@@ -2231,6 +2231,9 @@ choice
 
config LEGACY_VSYSCALL_NATIVE
bool "Native"
+   # The VSYSCALL page comes from the kernel page tables
+   # and is not available when KAISER is enabled.
+   depends on ! KAISER
help
  Actual executable code is located in the fixed vsyscall
  address mapping, implementing time() efficiently. Since
@@ -2248,6 +2251,11 @@ choice
  exploits. This configuration is recommended when userspace
  still uses the vsyscall area.
 
+ When KAISER is enabled, the vsyscall area will become
+ unreadable.  This emulation option still works, but KAISER
+ will make it harder to do things like trace code using the
+ emulation.
+
config LEGACY_VSYSCALL_NONE
bool "None"
help
_


[PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-10 Thread Dave Hansen

From: Dave Hansen 

The KAISER code attempts to "poison" the user portion of the kernel page
tables.  It detects entries that it wants that it wants to poison in two
ways:
 * Looking for addresses >= PAGE_OFFSET
 * Looking for entries without _PAGE_USER set

But, to allow the _PAGE_USER check to work, it must never be set on
init_mm entries, and an earlier patch in this series ensured that it
will never be set.

The VDSO is at a address >= PAGE_OFFSET and it is also mapped by init_mm.
Because of the earlier, KAISER-enforced restriction, _PAGE_USER is never
set which makes the VDSO unreadable to userspace.

This makes the "NATIVE" case totally unusable since userspace can not
even see the memory any more.  Disable it whenever KAISER is enabled.

Also add some help text about how KAISER might affect the emulation
case as well.

Signed-off-by: Dave Hansen 
Cc: Moritz Lipp 
Cc: Daniel Gruss 
Cc: Michael Schwarz 
Cc: Richard Fellner 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Kees Cook 
Cc: Hugh Dickins 
Cc: x...@kernel.org

---

 b/arch/x86/Kconfig |8 
 1 file changed, 8 insertions(+)

diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig
--- a/arch/x86/Kconfig~kaiser-no-vsyscall   2017-11-10 11:22:18.366244926 
-0800
+++ b/arch/x86/Kconfig  2017-11-10 11:22:18.370244926 -0800
@@ -2231,6 +2231,9 @@ choice
 
config LEGACY_VSYSCALL_NATIVE
bool "Native"
+   # The VSYSCALL page comes from the kernel page tables
+   # and is not available when KAISER is enabled.
+   depends on ! KAISER
help
  Actual executable code is located in the fixed vsyscall
  address mapping, implementing time() efficiently. Since
@@ -2248,6 +2251,11 @@ choice
  exploits. This configuration is recommended when userspace
  still uses the vsyscall area.
 
+ When KAISER is enabled, the vsyscall area will become
+ unreadable.  This emulation option still works, but KAISER
+ will make it harder to do things like trace code using the
+ emulation.
+
config LEGACY_VSYSCALL_NONE
bool "None"
help
_


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Dave Hansen
On 11/09/2017 06:25 PM, Andy Lutomirski wrote:
> Here are two proposals to address this without breaking vsyscalls.
> 
> 1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
> mappings but, optionally, warn if you see _PAGE_USER on any address
> that isn't the vsyscall page.
> 
> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
> KAISER doesn't muck with it.

These are totally doable.  But, what's the big deal with breaking native
vsyscall?  We can still do the emulation so nothing breaks: it is just slow.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Dave Hansen
On 11/09/2017 06:25 PM, Andy Lutomirski wrote:
> Here are two proposals to address this without breaking vsyscalls.
> 
> 1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
> mappings but, optionally, warn if you see _PAGE_USER on any address
> that isn't the vsyscall page.
> 
> 2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
> KAISER doesn't muck with it.

These are totally doable.  But, what's the big deal with breaking native
vsyscall?  We can still do the emulation so nothing breaks: it is just slow.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Andy Lutomirski
On Thu, Nov 9, 2017 at 5:22 PM, Dave Hansen  wrote:
> On 11/09/2017 05:04 PM, Andy Lutomirski wrote:
>> On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen  
>> wrote:
>>> On 11/09/2017 04:53 PM, Andy Lutomirski wrote:
> The KAISER code attempts to "poison" the user portion of the kernel page
> tables.  It detects the entries pages that it wants that it wants to
> poison in two ways:
>  * Looking for addresses >= PAGE_OFFSET
>  * Looking for entries without _PAGE_USER set
 What do you mean "poison"?
>>>
>>> I meant the _PAGE_NX magic that we do in here:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a
>>>
>>> to ensure that userspace is unable to run on the kernel PGD.
>>
>> Aha, I get it.  Why not just drop the _PAGE_USER check?  You could
>> instead warn if you see a _PAGE_USER page that doesn't have the
>> correct address for the vsyscall.
>
> The _PAGE_USER check helps us with kernel things that want to create
> mappings below PAGE_OFFSET.  The EFI code was the prime user for this.
> Without this, we poison the EFI mappings and the EFI calls die.

OK, let's see if I understand.  EFI and maybe some other stuff creates
low mappings with _PAGE_USER clear that are intended to be executed in
kernel mode, and, if you just set NX on all low mappings in kernel
mode, then it doesn't work.

Here are two proposals to address this without breaking vsyscalls.

1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
mappings but, optionally, warn if you see _PAGE_USER on any address
that isn't the vsyscall page.

2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
KAISER doesn't muck with it.

--Andy


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Andy Lutomirski
On Thu, Nov 9, 2017 at 5:22 PM, Dave Hansen  wrote:
> On 11/09/2017 05:04 PM, Andy Lutomirski wrote:
>> On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen  
>> wrote:
>>> On 11/09/2017 04:53 PM, Andy Lutomirski wrote:
> The KAISER code attempts to "poison" the user portion of the kernel page
> tables.  It detects the entries pages that it wants that it wants to
> poison in two ways:
>  * Looking for addresses >= PAGE_OFFSET
>  * Looking for entries without _PAGE_USER set
 What do you mean "poison"?
>>>
>>> I meant the _PAGE_NX magic that we do in here:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a
>>>
>>> to ensure that userspace is unable to run on the kernel PGD.
>>
>> Aha, I get it.  Why not just drop the _PAGE_USER check?  You could
>> instead warn if you see a _PAGE_USER page that doesn't have the
>> correct address for the vsyscall.
>
> The _PAGE_USER check helps us with kernel things that want to create
> mappings below PAGE_OFFSET.  The EFI code was the prime user for this.
> Without this, we poison the EFI mappings and the EFI calls die.

OK, let's see if I understand.  EFI and maybe some other stuff creates
low mappings with _PAGE_USER clear that are intended to be executed in
kernel mode, and, if you just set NX on all low mappings in kernel
mode, then it doesn't work.

Here are two proposals to address this without breaking vsyscalls.

1. Set NX on low mappings that are _PAGE_USER.  Don't set NX on high
mappings but, optionally, warn if you see _PAGE_USER on any address
that isn't the vsyscall page.

2. Ignore _PAGE_USER entirely and just mark the EFI mm as special so
KAISER doesn't muck with it.

--Andy


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Dave Hansen
On 11/09/2017 05:04 PM, Andy Lutomirski wrote:
> On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen  
> wrote:
>> On 11/09/2017 04:53 PM, Andy Lutomirski wrote:
 The KAISER code attempts to "poison" the user portion of the kernel page
 tables.  It detects the entries pages that it wants that it wants to
 poison in two ways:
  * Looking for addresses >= PAGE_OFFSET
  * Looking for entries without _PAGE_USER set
>>> What do you mean "poison"?
>>
>> I meant the _PAGE_NX magic that we do in here:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a
>>
>> to ensure that userspace is unable to run on the kernel PGD.
> 
> Aha, I get it.  Why not just drop the _PAGE_USER check?  You could
> instead warn if you see a _PAGE_USER page that doesn't have the
> correct address for the vsyscall.

The _PAGE_USER check helps us with kernel things that want to create
mappings below PAGE_OFFSET.  The EFI code was the prime user for this.
Without this, we poison the EFI mappings and the EFI calls die.

I think there might have also been a case for the secondary CPU bringup
that needed hacking if we didn't do this.




Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Dave Hansen
On 11/09/2017 05:04 PM, Andy Lutomirski wrote:
> On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen  
> wrote:
>> On 11/09/2017 04:53 PM, Andy Lutomirski wrote:
 The KAISER code attempts to "poison" the user portion of the kernel page
 tables.  It detects the entries pages that it wants that it wants to
 poison in two ways:
  * Looking for addresses >= PAGE_OFFSET
  * Looking for entries without _PAGE_USER set
>>> What do you mean "poison"?
>>
>> I meant the _PAGE_NX magic that we do in here:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a
>>
>> to ensure that userspace is unable to run on the kernel PGD.
> 
> Aha, I get it.  Why not just drop the _PAGE_USER check?  You could
> instead warn if you see a _PAGE_USER page that doesn't have the
> correct address for the vsyscall.

The _PAGE_USER check helps us with kernel things that want to create
mappings below PAGE_OFFSET.  The EFI code was the prime user for this.
Without this, we poison the EFI mappings and the EFI calls die.

I think there might have also been a case for the secondary CPU bringup
that needed hacking if we didn't do this.




Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Andy Lutomirski
On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen  wrote:
> On 11/09/2017 04:53 PM, Andy Lutomirski wrote:
>>> The KAISER code attempts to "poison" the user portion of the kernel page
>>> tables.  It detects the entries pages that it wants that it wants to
>>> poison in two ways:
>>>  * Looking for addresses >= PAGE_OFFSET
>>>  * Looking for entries without _PAGE_USER set
>> What do you mean "poison"?
>
> I meant the _PAGE_NX magic that we do in here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a
>
> to ensure that userspace is unable to run on the kernel PGD.

Aha, I get it.  Why not just drop the _PAGE_USER check?  You could
instead warn if you see a _PAGE_USER page that doesn't have the
correct address for the vsyscall.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Andy Lutomirski
On Thu, Nov 9, 2017 at 4:57 PM, Dave Hansen  wrote:
> On 11/09/2017 04:53 PM, Andy Lutomirski wrote:
>>> The KAISER code attempts to "poison" the user portion of the kernel page
>>> tables.  It detects the entries pages that it wants that it wants to
>>> poison in two ways:
>>>  * Looking for addresses >= PAGE_OFFSET
>>>  * Looking for entries without _PAGE_USER set
>> What do you mean "poison"?
>
> I meant the _PAGE_NX magic that we do in here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a
>
> to ensure that userspace is unable to run on the kernel PGD.

Aha, I get it.  Why not just drop the _PAGE_USER check?  You could
instead warn if you see a _PAGE_USER page that doesn't have the
correct address for the vsyscall.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Dave Hansen
On 11/09/2017 04:53 PM, Andy Lutomirski wrote:
>> The KAISER code attempts to "poison" the user portion of the kernel page
>> tables.  It detects the entries pages that it wants that it wants to
>> poison in two ways:
>>  * Looking for addresses >= PAGE_OFFSET
>>  * Looking for entries without _PAGE_USER set
> What do you mean "poison"?

I meant the _PAGE_NX magic that we do in here:

https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a

to ensure that userspace is unable to run on the kernel PGD.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Dave Hansen
On 11/09/2017 04:53 PM, Andy Lutomirski wrote:
>> The KAISER code attempts to "poison" the user portion of the kernel page
>> tables.  It detects the entries pages that it wants that it wants to
>> poison in two ways:
>>  * Looking for addresses >= PAGE_OFFSET
>>  * Looking for entries without _PAGE_USER set
> What do you mean "poison"?

I meant the _PAGE_NX magic that we do in here:

https://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-kaiser.git/commit/?h=kaiser-414rc7-20171108=c4f7d0819170761f092fcf2327b85b082368e73a

to ensure that userspace is unable to run on the kernel PGD.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Andy Lutomirski
On Thu, Nov 9, 2017 at 11:26 AM, Dave Hansen
 wrote:
> On 11/09/2017 11:04 AM, Andy Lutomirski wrote:
>> On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen
>>  wrote:
>>>
>>> From: Dave Hansen 
>>>
>>> The VSYSCALL page is mapped by kernel page tables at a kernel address.
>>> It is troublesome to support with KAISER in place, so disable the
>>> native case.
>>>
>>> Also add some help text about how KAISER might affect the emulation
>>> case as well.
>>
>> Can you re-explain why this is helpful?
>
> How about this?
>
> The KAISER code attempts to "poison" the user portion of the kernel page
> tables.  It detects the entries pages that it wants that it wants to
> poison in two ways:
>  * Looking for addresses >= PAGE_OFFSET
>  * Looking for entries without _PAGE_USER set

What do you mean "poison"?

Anyway, the stuff here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_stack

is an attempt to create the infrastructure needed to move (almost?)
everything needed in the user tables into the fixmap.  If that ends up
working well, then perhaps the fixmap should just be completely
special-cased, in which case I think this issue goes away.  What I
have in mind is something like:

set_user_fixmap(index, pa, prot);

that sets an entry in the *user* fixmap.  All user mms would get the
same PGD entry for the user fixmap.

(And yes, it quite correctly fails kbuild bot right now.  That's why I
haven't emailed out the patches yet.)


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Andy Lutomirski
On Thu, Nov 9, 2017 at 11:26 AM, Dave Hansen
 wrote:
> On 11/09/2017 11:04 AM, Andy Lutomirski wrote:
>> On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen
>>  wrote:
>>>
>>> From: Dave Hansen 
>>>
>>> The VSYSCALL page is mapped by kernel page tables at a kernel address.
>>> It is troublesome to support with KAISER in place, so disable the
>>> native case.
>>>
>>> Also add some help text about how KAISER might affect the emulation
>>> case as well.
>>
>> Can you re-explain why this is helpful?
>
> How about this?
>
> The KAISER code attempts to "poison" the user portion of the kernel page
> tables.  It detects the entries pages that it wants that it wants to
> poison in two ways:
>  * Looking for addresses >= PAGE_OFFSET
>  * Looking for entries without _PAGE_USER set

What do you mean "poison"?

Anyway, the stuff here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_stack

is an attempt to create the infrastructure needed to move (almost?)
everything needed in the user tables into the fixmap.  If that ends up
working well, then perhaps the fixmap should just be completely
special-cased, in which case I think this issue goes away.  What I
have in mind is something like:

set_user_fixmap(index, pa, prot);

that sets an entry in the *user* fixmap.  All user mms would get the
same PGD entry for the user fixmap.

(And yes, it quite correctly fails kbuild bot right now.  That's why I
haven't emailed out the patches yet.)


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Dave Hansen
On 11/09/2017 11:04 AM, Andy Lutomirski wrote:
> On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen
>  wrote:
>>
>> From: Dave Hansen 
>>
>> The VSYSCALL page is mapped by kernel page tables at a kernel address.
>> It is troublesome to support with KAISER in place, so disable the
>> native case.
>>
>> Also add some help text about how KAISER might affect the emulation
>> case as well.
> 
> Can you re-explain why this is helpful?

How about this?

The KAISER code attempts to "poison" the user portion of the kernel page
tables.  It detects the entries pages that it wants that it wants to
poison in two ways:
 * Looking for addresses >= PAGE_OFFSET
 * Looking for entries without _PAGE_USER set

But, to allow the _PAGE_USER check to work, we stopped it from being
set on all init_mm entries.

The VDSO is at a address >= PAGE_OFFSET and it is also mapped by the
init_mm.  The fact that we remove _PAGE_USER from the page tables makes
it unreadable to userspace.

This makes the "NATIVE" case totally unusable since userspace can not
even see the memory any more.  Disable it whenever KAISER is enabled.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Dave Hansen
On 11/09/2017 11:04 AM, Andy Lutomirski wrote:
> On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen
>  wrote:
>>
>> From: Dave Hansen 
>>
>> The VSYSCALL page is mapped by kernel page tables at a kernel address.
>> It is troublesome to support with KAISER in place, so disable the
>> native case.
>>
>> Also add some help text about how KAISER might affect the emulation
>> case as well.
> 
> Can you re-explain why this is helpful?

How about this?

The KAISER code attempts to "poison" the user portion of the kernel page
tables.  It detects the entries pages that it wants that it wants to
poison in two ways:
 * Looking for addresses >= PAGE_OFFSET
 * Looking for entries without _PAGE_USER set

But, to allow the _PAGE_USER check to work, we stopped it from being
set on all init_mm entries.

The VDSO is at a address >= PAGE_OFFSET and it is also mapped by the
init_mm.  The fact that we remove _PAGE_USER from the page tables makes
it unreadable to userspace.

This makes the "NATIVE" case totally unusable since userspace can not
even see the memory any more.  Disable it whenever KAISER is enabled.


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Andy Lutomirski
On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen
 wrote:
>
> From: Dave Hansen 
>
> The VSYSCALL page is mapped by kernel page tables at a kernel address.
> It is troublesome to support with KAISER in place, so disable the
> native case.
>
> Also add some help text about how KAISER might affect the emulation
> case as well.

Can you re-explain why this is helpful?

Also, I'm about to send patches that may cause a rethinking of how
KAISER handles the fixmap.

--Andy

>
> Signed-off-by: Dave Hansen 
> Cc: Moritz Lipp 
> Cc: Daniel Gruss 
> Cc: Michael Schwarz 
> Cc: Richard Fellner 
> Cc: Andy Lutomirski 
> Cc: Linus Torvalds 
> Cc: Kees Cook 
> Cc: Hugh Dickins 
> Cc: x...@kernel.org
>
> ---
>
>  b/arch/x86/Kconfig |8 
>  1 file changed, 8 insertions(+)
>
> diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig
> --- a/arch/x86/Kconfig~kaiser-no-vsyscall   2017-11-08 10:45:39.157681370 
> -0800
> +++ b/arch/x86/Kconfig  2017-11-08 10:45:39.162681370 -0800
> @@ -2231,6 +2231,9 @@ choice
>
> config LEGACY_VSYSCALL_NATIVE
> bool "Native"
> +   # The VSYSCALL page comes from the kernel page tables
> +   # and is not available when KAISER is enabled.
> +   depends on ! KAISER
> help
>   Actual executable code is located in the fixed vsyscall
>   address mapping, implementing time() efficiently. Since
> @@ -2248,6 +2251,11 @@ choice
>   exploits. This configuration is recommended when userspace
>   still uses the vsyscall area.
>
> + When KAISER is enabled, the vsyscall area will become
> + unreadable.  This emulation option still works, but KAISER
> + will make it harder to do things like trace code using the
> + emulation.
> +
> config LEGACY_VSYSCALL_NONE
> bool "None"
> help
> _


Re: [PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-09 Thread Andy Lutomirski
On Wed, Nov 8, 2017 at 11:47 AM, Dave Hansen
 wrote:
>
> From: Dave Hansen 
>
> The VSYSCALL page is mapped by kernel page tables at a kernel address.
> It is troublesome to support with KAISER in place, so disable the
> native case.
>
> Also add some help text about how KAISER might affect the emulation
> case as well.

Can you re-explain why this is helpful?

Also, I'm about to send patches that may cause a rethinking of how
KAISER handles the fixmap.

--Andy

>
> Signed-off-by: Dave Hansen 
> Cc: Moritz Lipp 
> Cc: Daniel Gruss 
> Cc: Michael Schwarz 
> Cc: Richard Fellner 
> Cc: Andy Lutomirski 
> Cc: Linus Torvalds 
> Cc: Kees Cook 
> Cc: Hugh Dickins 
> Cc: x...@kernel.org
>
> ---
>
>  b/arch/x86/Kconfig |8 
>  1 file changed, 8 insertions(+)
>
> diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig
> --- a/arch/x86/Kconfig~kaiser-no-vsyscall   2017-11-08 10:45:39.157681370 
> -0800
> +++ b/arch/x86/Kconfig  2017-11-08 10:45:39.162681370 -0800
> @@ -2231,6 +2231,9 @@ choice
>
> config LEGACY_VSYSCALL_NATIVE
> bool "Native"
> +   # The VSYSCALL page comes from the kernel page tables
> +   # and is not available when KAISER is enabled.
> +   depends on ! KAISER
> help
>   Actual executable code is located in the fixed vsyscall
>   address mapping, implementing time() efficiently. Since
> @@ -2248,6 +2251,11 @@ choice
>   exploits. This configuration is recommended when userspace
>   still uses the vsyscall area.
>
> + When KAISER is enabled, the vsyscall area will become
> + unreadable.  This emulation option still works, but KAISER
> + will make it harder to do things like trace code using the
> + emulation.
> +
> config LEGACY_VSYSCALL_NONE
> bool "None"
> help
> _


[PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-08 Thread Dave Hansen

From: Dave Hansen 

The VSYSCALL page is mapped by kernel page tables at a kernel address.
It is troublesome to support with KAISER in place, so disable the
native case.

Also add some help text about how KAISER might affect the emulation
case as well.

Signed-off-by: Dave Hansen 
Cc: Moritz Lipp 
Cc: Daniel Gruss 
Cc: Michael Schwarz 
Cc: Richard Fellner 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Kees Cook 
Cc: Hugh Dickins 
Cc: x...@kernel.org

---

 b/arch/x86/Kconfig |8 
 1 file changed, 8 insertions(+)

diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig
--- a/arch/x86/Kconfig~kaiser-no-vsyscall   2017-11-08 10:45:39.157681370 
-0800
+++ b/arch/x86/Kconfig  2017-11-08 10:45:39.162681370 -0800
@@ -2231,6 +2231,9 @@ choice
 
config LEGACY_VSYSCALL_NATIVE
bool "Native"
+   # The VSYSCALL page comes from the kernel page tables
+   # and is not available when KAISER is enabled.
+   depends on ! KAISER
help
  Actual executable code is located in the fixed vsyscall
  address mapping, implementing time() efficiently. Since
@@ -2248,6 +2251,11 @@ choice
  exploits. This configuration is recommended when userspace
  still uses the vsyscall area.
 
+ When KAISER is enabled, the vsyscall area will become
+ unreadable.  This emulation option still works, but KAISER
+ will make it harder to do things like trace code using the
+ emulation.
+
config LEGACY_VSYSCALL_NONE
bool "None"
help
_


[PATCH 24/30] x86, kaiser: disable native VSYSCALL

2017-11-08 Thread Dave Hansen

From: Dave Hansen 

The VSYSCALL page is mapped by kernel page tables at a kernel address.
It is troublesome to support with KAISER in place, so disable the
native case.

Also add some help text about how KAISER might affect the emulation
case as well.

Signed-off-by: Dave Hansen 
Cc: Moritz Lipp 
Cc: Daniel Gruss 
Cc: Michael Schwarz 
Cc: Richard Fellner 
Cc: Andy Lutomirski 
Cc: Linus Torvalds 
Cc: Kees Cook 
Cc: Hugh Dickins 
Cc: x...@kernel.org

---

 b/arch/x86/Kconfig |8 
 1 file changed, 8 insertions(+)

diff -puN arch/x86/Kconfig~kaiser-no-vsyscall arch/x86/Kconfig
--- a/arch/x86/Kconfig~kaiser-no-vsyscall   2017-11-08 10:45:39.157681370 
-0800
+++ b/arch/x86/Kconfig  2017-11-08 10:45:39.162681370 -0800
@@ -2231,6 +2231,9 @@ choice
 
config LEGACY_VSYSCALL_NATIVE
bool "Native"
+   # The VSYSCALL page comes from the kernel page tables
+   # and is not available when KAISER is enabled.
+   depends on ! KAISER
help
  Actual executable code is located in the fixed vsyscall
  address mapping, implementing time() efficiently. Since
@@ -2248,6 +2251,11 @@ choice
  exploits. This configuration is recommended when userspace
  still uses the vsyscall area.
 
+ When KAISER is enabled, the vsyscall area will become
+ unreadable.  This emulation option still works, but KAISER
+ will make it harder to do things like trace code using the
+ emulation.
+
config LEGACY_VSYSCALL_NONE
bool "None"
help
_