On Fri, Aug 9, 2019 at 6:11 PM Alistair Francis
wrote:
>
> Update the #defines around sys_fstat64() and sys_fstatat64() to match
> the #defines around the __NR3264_fstatat and __NR3264_fstat definitions
> in include/uapi/asm-generic/unistd.h. This avoids compiler failures if
> one is defined.
verifiers.
Signed-off-by: Andy Lutomirski
---
kernel/bpf/syscall.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 23f8f89d2a86..730afa2be786 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1649,8 +1649,7
In the interest of making bpf() more useful by unprivileged users,
this patch teaches bpf to respect access modes on map and prog
inodes. The permissions are:
R on a map: read the map
W on a map: write the map
Referencing a map from a program should require RW.
R on a prog: Read or introspect
is currently the only user in the
kernel outside of mknod() itself that uses it to create regular
(i.e. S_IFREG) files.
Signed-off-by: Andy Lutomirski
---
kernel/bpf/inode.c | 4
1 file changed, 4 deletions(-)
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index cb07736b33ae
Other than the mknod() patch, this is not ready for prime time. These
patches try to make progress toward making bpf() more useful without
privilege
Andy Lutomirski (4):
bpf: Respect persistent map and prog access modes
bpf: Don't require mknod() permission to pin an object
bpf: Add a way
don't want to inadvertently generate audit events for privileges that
are never used.
So it's the idea that counts :)
Signed-off-by: Andy Lutomirski
---
include/linux/bpf.h | 15 +++
include/linux/bpf_verifier.h | 1 +
kernel/bpf/verifier.c| 8
On Wed, Dec 12, 2018 at 6:43 AM Jan Kara wrote:
>
> On Wed 12-12-18 09:17:08, Mickaël Salaün wrote:
> > When the O_MAYEXEC flag is passed, sys_open() may be subject to
> > additional restrictions depending on a security policy implemented by an
> > LSM through the inode_permission hook.
> >
> >
> On Aug 2, 2019, at 3:22 PM, Thomas Gleixner wrote:
>
>> On Fri, 2 Aug 2019, Paolo Bonzini wrote:
>>> On 01/08/19 23:47, Thomas Gleixner wrote:
>>> Right you are about cond_resched() being called, but for SRCU this does not
>>> matter unless there is some way to do a synchronize operation on
On Wed, Jul 31, 2019 at 11:09 PM wrote:
>
> On July 31, 2019 10:34:26 PM PDT, Andy Lutomirski wrote:
> >On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov wrote:
> >>
> >> As it has been discussed on timens RFC, adding a new conditional
> >branch
> >> `i
> On Aug 1, 2019, at 7:48 AM, Peter Zijlstra wrote:
>
>> On Thu, Aug 01, 2019 at 04:32:51PM +0200, Thomas Gleixner wrote:
>> +#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
>> +/**
>> + * tracehook_handle_notify_resume - Notify resume handling for virt
>> + *
>> + * Called with interrupts and preemption
On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov wrote:
>
> As it has been discussed on timens RFC, adding a new conditional branch
> `if (inside_time_ns)` on VDSO for all processes is undesirable.
> It will add a penalty for everybody as branch predictor may mispredict
> the jump. Also there are
On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov wrote:
>
> From: Andrei Vagin
>
> Time Namespace isolates clock values.
> +static int timens_install(struct nsproxy *nsproxy, struct ns_common *new)
> +{
> + struct time_namespace *ns = to_time_ns(new);
> +
> + if
On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov wrote:
>
> Although, time namespace can work with VVAR VMA split, it seems worth
> to forbid splitting VVAR resulting in stricter ABI and reducing amount
> of corner-cases to consider while working further on VDSO.
>
> I don't think there is any
On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov wrote:
>
> From: Andrei Vagin
>
> As modern applications fetch time from VDSO without entering the kernel,
> it's needed to provide offsets for userspace code inside time namespace.
>
> A page for timens offsets is allocated on time namespace
On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov wrote:
>
> From: Andrei Vagin
>
> As it has been discussed on timens RFC, adding a new conditional branch
> `if (inside_time_ns)` on VDSO for all processes is undesirable.
>
> Addressing those problems, there are two versions of VDSO's .so:
> for
On Tue, Jul 30, 2019 at 2:39 AM Thomas Gleixner wrote:
>
> To address the regression which causes seccomp to deny applications the
> access to clock_gettime64() and clock_getres64() syscalls because they
> are not enabled in the existing filters.
>
> That trips over the fact that 32bit VDSOs use
> On Jul 29, 2019, at 8:03 AM, Peter Zijlstra wrote:
>
>> On Mon, Jul 29, 2019 at 10:51:51AM -0400, Waiman Long wrote:
>>> On 7/29/19 4:52 AM, Peter Zijlstra wrote:
On Sat, Jul 27, 2019 at 01:10:47PM -0400, Waiman Long wrote:
It was found that a dying mm_struct where the owning task
On Sun, Jul 28, 2019 at 6:08 PM Eiichi Tsukata wrote:
>
> If context tracking is enabled, causing page fault in preemptirq
> irq_enable or irq_disable events triggers the following RCU EQS warning.
>
Yuck.
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index
On Sun, Jul 28, 2019 at 6:59 PM Daniel Axtens wrote:
>
> Currently, when a kernel stack overflow is detected via VMAP_STACK,
> the task is killed with die().
>
> This isn't safe, because we don't know how that process has affected
> kernel state. In particular, we don't know what locks have been
mplement the 32bit variants which use the legacy syscalls and select the
> variant in the core library.
>
> The 64bit time variants are not removed because they are required for the
> time64 based vdso accessors.
Reviewed-by: Andy Lutomirski
at 32bit VDSOs use the new clock_gettime64() and
> clock_getres64() syscalls in the fallback path.
>
> Implement a __cvdso_clock_get*time32() variants which invokes the legacy
> 32bit syscalls when the architecture requests it.
>
> The conditional can go away once all architectur
; syscall fallback in the 64bit and 32bit variants.
>
> Preparatory work for using legacy syscalls in 32bit VDSO. No functional
> change.
Reviewed-by: Andy Lutomirski
ast
> which only works because the pointer is NULL.
Reviewed-by: Andy Lutomirski
FWIW, the equivalent change to gettimeofday would be an ABI break,
since we historically have that check, and it even makes sense there.
On Sat, Jul 27, 2019 at 2:52 PM Thomas Gleixner wrote:
>
> On Sat, 27 Jul 2019, Thomas Gleixner wrote:
> > On Sat, 27 Jul 2019, Andy Lutomirski wrote:
> > >
> > > I think it's getting quite late to start inventing new seccomp
> > > features to fix th
On Fri, Jul 26, 2019 at 11:01 AM Sean Christopherson
wrote:
>
> +cc Paul
>
> On Wed, Jul 24, 2019 at 01:56:34AM +0200, Thomas Gleixner wrote:
> > On Tue, 23 Jul 2019, Kees Cook wrote:
> >
> > > On Wed, Jul 24, 2019 at 12:59:03AM +0200, Thomas Gleixner wrote:
> > > > And as we have
On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
wrote:
>
> Add an SGX device to enable userspace to allocate EPC without an
> associated enclave. The intended and only known use case for direct EPC
> allocation is to expose EPC to a KVM guest, hence the virt_epc moniker,
> virt.{c,h} files
On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
wrote:
>
> The SGX subsystem restricts access to a subset of enclave attributes to
> provide additional security for an uncompromised kernel, e.g. to prevent
> malware from using the PROVISIONKEY to ensure its nodes are running
> inside a
On Fri, Jul 26, 2019 at 10:52 PM Sean Christopherson
wrote:
>
> Similar to the existing AMD #NPF case where emulation of the current
> instruction is not possible due to lack of information, virtualization
> of Intel SGX will introduce a scenario where emulation is not possible
> due to the
A data breakpoint near the top of an IST stack will cause unresoverable
recursion. A data breakpoint on the GDT, IDT, or TSS is terrifying.
Prevent either of these from happening.
Co-developed-by: Peter Zijlstra
Signed-off-by: Andy Lutomirski
---
The rest of my series is still in progress
On Tue, Jul 23, 2019 at 2:55 PM Kees Cook wrote:
>
> On Mon, Jul 22, 2019 at 04:47:36PM -0700, Andy Lutomirski wrote:
> > On Mon, Jul 22, 2019 at 4:28 PM Kees Cook wrote:
> > > I've built a straw-man for this idea... but I have to say I don't
> > > like it. This
> On Jul 23, 2019, at 2:18 AM, Peter Zijlstra wrote:
>
>> On Mon, Jul 22, 2019 at 04:47:36PM -0700, Andy Lutomirski wrote:
>>
>> I don't love this whole concept, but I also don't have a better idea.
>
> Are we really talking about changing the kernel bec
On Mon, Jul 22, 2019 at 4:28 PM Kees Cook wrote:
>
> On Mon, Jul 22, 2019 at 12:17:16PM -0700, Andy Lutomirski wrote:
> > On Mon, Jul 22, 2019 at 11:39 AM Kees Cook wrote:
> > >
> > > On Mon, Jul 22, 2019 at 08:31:32PM +0200, Thomas Gleixner wrote:
> > >
On Mon, Jul 22, 2019 at 11:39 AM Kees Cook wrote:
>
> On Mon, Jul 22, 2019 at 08:31:32PM +0200, Thomas Gleixner wrote:
> > On Mon, 22 Jul 2019, Kees Cook wrote:
> > > Just so I'm understanding: the vDSO change introduced code to make an
> > > actual syscall on i386, which for most seccomp filters
Commit-ID: 6365b842aae4490ebfafadfc6bb27a6d3cc54757
Gitweb: https://git.kernel.org/tip/6365b842aae4490ebfafadfc6bb27a6d3cc54757
Author: Andy Lutomirski
AuthorDate: Wed, 3 Jul 2019 13:34:04 -0700
Committer: Thomas Gleixner
CommitDate: Mon, 22 Jul 2019 10:31:23 +0200
x86/syscalls: Split
Commit-ID: f85a8573ceb225e606fcf38a9320782316f47c71
Gitweb: https://git.kernel.org/tip/f85a8573ceb225e606fcf38a9320782316f47c71
Author: Andy Lutomirski
AuthorDate: Wed, 3 Jul 2019 13:34:03 -0700
Committer: Thomas Gleixner
CommitDate: Mon, 22 Jul 2019 10:31:22 +0200
x86/syscalls
Commit-ID: a8d03c3f300eefff3b5c14798409e4b43e37dd9b
Gitweb: https://git.kernel.org/tip/a8d03c3f300eefff3b5c14798409e4b43e37dd9b
Author: Andy Lutomirski
AuthorDate: Wed, 3 Jul 2019 13:34:02 -0700
Committer: Thomas Gleixner
CommitDate: Mon, 22 Jul 2019 10:31:22 +0200
x86/syscalls: Use
Commit-ID: 45e29d119e9923ff14dfb840e3482bef1667bbfb
Gitweb: https://git.kernel.org/tip/45e29d119e9923ff14dfb840e3482bef1667bbfb
Author: Andy Lutomirski
AuthorDate: Wed, 3 Jul 2019 13:34:05 -0700
Committer: Thomas Gleixner
CommitDate: Mon, 22 Jul 2019 10:31:22 +0200
x86/syscalls: Make
Commit-ID: 229b969b3d38bc28bcd55841ee7ca9a9afb922f3
Gitweb: https://git.kernel.org/tip/229b969b3d38bc28bcd55841ee7ca9a9afb922f3
Author: Andy Lutomirski
AuthorDate: Sun, 14 Jul 2019 08:23:14 -0700
Committer: Thomas Gleixner
CommitDate: Mon, 22 Jul 2019 10:12:32 +0200
x86/apic
On Fri, Jul 19, 2019 at 11:54 AM Nadav Amit wrote:
>
> > On Jul 19, 2019, at 11:48 AM, Dave Hansen wrote:
> >
> > On 7/19/19 11:43 AM, Nadav Amit wrote:
> >> Andy said that for the lazy tlb optimizations there might soon be more
> >> shared state. If you prefer, I can move is_lazy outside of
On Fri, Jul 19, 2019 at 8:59 PM Eiichi Tsukata wrote:
>
>
> On 2019/07/19 5:27, Andy Lutomirski wrote:
> > Hi all-
> >
> > I suspect that a bunch of the bugs you're all finding boil down to:
> >
> > - Nested debug exceptions could corrupt the outer exceptio
> On Jul 19, 2019, at 1:03 PM, Sean Christopherson
> wrote:
>
> The generic vDSO implementation, starting with commit
>
> 7ac870747988 ("x86/vdso: Switch to generic vDSO implementation")
>
> breaks seccomp-enabled userspace on 32-bit x86 (i386) kernels. Prior to
> the generic
On Fri, Jul 19, 2019 at 5:21 AM Joerg Roedel wrote:
>
> On Thu, Jul 18, 2019 at 12:04:49PM -0700, Andy Lutomirski wrote:
> > I find it problematic that there is no meaningful documentation as to
> > what vmalloc_sync_all() is supposed to do.
>
> Yeah, I found that too,
Hi all-
I suspect that a bunch of the bugs you're all finding boil down to:
- Nested debug exceptions could corrupt the outer exception's DR6.
- Nested debug exceptions in which *both* exceptions came from the
kernel were probably all kinds of buggy
- Data breakpoints in bad places in the
On Thu, Jul 18, 2019 at 2:17 AM Joerg Roedel wrote:
>
> Hi Andy,
>
> On Wed, Jul 17, 2019 at 02:24:09PM -0700, Andy Lutomirski wrote:
> > On Wed, Jul 17, 2019 at 12:14 AM Joerg Roedel wrote:
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > ind
On Wed, Jul 17, 2019 at 12:14 AM Joerg Roedel wrote:
>
> From: Joerg Roedel
>
> On x86-32 with PTI enabled, parts of the kernel page-tables
> are not shared between processes. This can cause mappings in
> the vmalloc/ioremap area to persist in some page-tables
> after the regions is unmapped and
On Tue, Jul 16, 2019 at 2:53 PM Vegard Nossum wrote:
>
>
> On 7/16/19 9:33 PM, Vegard Nossum wrote:
> >
> > On 7/11/19 1:40 PM, Peter Zijlstra wrote:
> >> Hi,
> >>
> >> Here's the latest (and hopefully final) set of tracing vs CR2 patches.
> >>
> >> They are basically the same as v2, with only
On Mon, Jul 15, 2019 at 4:30 PM Andrew Cooper wrote:
>
> On 15/07/2019 19:17, Nadav Amit wrote:
> >> On Jul 15, 2019, at 8:16 AM, Andrew Cooper
> >> wrote:
> >>
> >> There is a lot of infrastructure for functionality which is used
> >> exclusively in __{save,restore}_processor_state() on the
On Mon, Jul 15, 2019 at 3:53 PM Andi Kleen wrote:
>
> > I haven't tested on a real kernel with i915. Does i915 really hit
> > this code path? Does it happen more than once or twice at boot?
>
> Yes some workloads allocate/free a lot of write combined memory
> for graphics objects.
>
But where
On Mon, Jul 15, 2019 at 12:38 PM Andi Kleen wrote:
>
> >
> > That does not answer the question whether it's worthwhile to do that.
>
> It's likely worthwhile for (Intel integrated) graphics.
>
> There was also a recent issue with 3dxp/dax, which uses ioremap in some
> cases.
>
FWIW, I applied
On Mon, Jul 15, 2019 at 12:39 PM Andi Kleen wrote:
>
> > Right, we don't know where the PAT invocation comes from and whether they
> > are safe to omit flushing the cache. The module load code would be one
> > obvious candidate.
>
> Module load just changes the writable/executable status, right?
On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen wrote:
>
> Juergen Gross writes:
>
> > The long term plan has been to replace Xen PV guests by PVH. The first
> > victim of that plan are now 32-bit PV guests, as those are used only
> > rather seldom these days. Xen on x86 requires 64-bit support and
Commit-ID: c7ca0b614513afba57824cae68447f9c32b1ee61
Gitweb: https://git.kernel.org/tip/c7ca0b614513afba57824cae68447f9c32b1ee61
Author: Andy Lutomirski
AuthorDate: Mon, 15 Jul 2019 07:21:44 -0700
Committer: Thomas Gleixner
CommitDate: Mon, 15 Jul 2019 17:12:31 +0200
Revert "x86/p
On Mon, Jul 15, 2019 at 7:21 AM Uros Bizjak wrote:
>
> CPUs which have self-snooping capability can handle conflicting
> memory type across CPUs by snooping its own cache. Commit #fd329f276ecaa
> ("x86/mtrr: Skip cache flushes on CPUs with cache self-snooping")
> avoids cache flushes when MTRR
On Mon, Jul 15, 2019 at 6:23 AM Juergen Gross wrote:
>
> On 15.07.19 15:00, Andrew Cooper wrote:
> > There is a lot of infrastructure for functionality which is used
> > exclusively in __{save,restore}_processor_state() on the suspend/resume
> > path.
> >
> > cr8 is an alias of APIC_TASKPRI, and
modifies the test case so that it tests the preexisting
behavior.
Signed-off-by: Andy Lutomirski
---
arch/x86/kernel/ptrace.c | 14 --
tools/testing/selftests/x86/fsgsbase.c | 22 --
2 files changed, 16 insertions(+), 20 deletions(-)
diff --git a/arch
IOMMU
that remaps interrupts. The purpose of this patch is to reduce the
chance that a certain class of device malfunctions crashes the
kernel in hard-to-debug ways.
Cc: Nadav Amit
Cc: Stephane Eranian
Cc: Feng Tang
Suggested-by: Andrew Cooper
Signed-off-by: Andy Lutomirski
---
arch/x86/kernel
On Fri, Jul 12, 2019 at 12:06 PM Peter Zijlstra wrote:
>
> On Fri, Jul 12, 2019 at 06:37:47PM +0200, Alexandre Chartre wrote:
> > On 7/12/19 5:16 PM, Thomas Gleixner wrote:
>
> > > Right. If we decide to expose more parts of the kernel mappings then
> > > that's
> > > just adding more stuff to
> On Jul 12, 2019, at 10:37 AM, Alexandre Chartre
> wrote:
>
>
>
>> On 7/12/19 5:16 PM, Thomas Gleixner wrote:
>>> On Fri, 12 Jul 2019, Peter Zijlstra wrote:
On Fri, Jul 12, 2019 at 01:56:44PM +0200, Alexandre Chartre wrote:
I think that's precisely what makes ASI and PTI
On Fri, Jul 12, 2019 at 6:45 AM Alexandre Chartre
wrote:
>
>
> On 7/12/19 2:50 PM, Peter Zijlstra wrote:
> > On Fri, Jul 12, 2019 at 01:56:44PM +0200, Alexandre Chartre wrote:
> >
> >> I think that's precisely what makes ASI and PTI different and independent.
> >> PTI is just about switching
On Thu, Jul 11, 2019 at 5:01 PM Carlo Wood wrote:
>
> I believe that the only safe solution is to let the Event Loop
> Thread do the deleting. So, if all else fails I'll have to add
> objects that a Worker Thread thinks need to be deleted to a
> FIFO that is processed by the Event Loop Thread
> On Jul 11, 2019, at 8:25 AM, Alexandre Chartre
> wrote:
>
> Address space isolation should be aborted if there is an interrupt,
> an exception or a context switch. Interrupt/exception handlers and
> context switch code need to run with the full kernel address space.
> Address space
On Thu, Jul 11, 2019 at 4:51 AM Peter Zijlstra wrote:
>
> Since INT3/#BP no longer runs on an IST, this workaround is no longer
> required.
>
> Tested by running lockdep+ftrace as described in the initial commit:
>
> 5963e317b1e9 ("ftrace/x86: Do not change stacks in DEBUG when calling
>
On Thu, Jul 11, 2019 at 1:13 AM Uros Bizjak wrote:
>
> Recent patch [1] disabled a self-snoop feature on a list of processor
> models with a known errata, so we are confident that the feature
> should work on remaining models also for other purposes than to speed
> up MTRR programming.
>
> I
> On Jul 9, 2019, at 1:24 AM, Pingfan Liu wrote:
>
>> On Tue, Jul 9, 2019 at 2:12 PM Thomas Gleixner wrote:
>>
>>> On Tue, 9 Jul 2019, Pingfan Liu wrote:
On Mon, Jul 8, 2019 at 5:35 PM Thomas Gleixner wrote:
It can and it does.
That's the whole point why we bring up
On Mon, Jul 8, 2019 at 11:29 AM Dmitry V. Levin wrote:
>
> The syscall entry/exit is now exposed via PTRACE_GETEVENTMSG,
> update the test accordingly.
Reviewed-by: Andy Lutomirski
> On Jul 8, 2019, at 3:35 AM, Thomas Gleixner wrote:
>
>> On Mon, 8 Jul 2019, Pingfan Liu wrote:
>>> On Mon, Jul 8, 2019 at 3:44 AM Thomas Gleixner wrote:
>>>
On Fri, 5 Jul 2019, Pingfan Liu wrote:
I hit a bug on an AMD machine, with kexec -l nr_cpus=4 option. nr_cpus
On Sun, Jul 7, 2019 at 8:10 AM Andy Lutomirski wrote:
>
> On Thu, Jul 4, 2019 at 1:03 PM Peter Zijlstra wrote:
> >
> > Despire the current efforts to read CR2 before tracing happens there
> > still exist a number of possible holes:
> >
> > idtentry
read_cr2(); /* whoopsie */
>
> And similar for i386.
>
> Fix it by pulling the CR2 read into the entry code, before any of that
> stuff gets a chance to run and ruin things.
Reviewed-by: Andy Lutomirski
Subject to the discussion as to whether this is the right approach at all.
> On Jul 6, 2019, at 6:08 PM, Linus Torvalds
> wrote:
>
> On Sat, Jul 6, 2019 at 3:41 PM Linus Torvalds
> wrote:
>>
>>> On Sat, Jul 6, 2019 at 3:27 PM Steven Rostedt wrote:
>>>
>>> We also have to deal with reading vmalloc'd data as that can fault too.
>>
>> Ahh, that may be a better
On Sat, Jul 6, 2019 at 3:27 PM Steven Rostedt wrote:
>
> On Sat, 6 Jul 2019 14:41:22 -0700
> Linus Torvalds wrote:
>
> > On Fri, Jul 5, 2019 at 6:50 AM Peter Zijlstra wrote:
> > >
> > > Also; all previous attempts at fixing this have been about pushing the
> > > read_cr2() earlier; notably:
> >
On Fri, Jul 5, 2019 at 1:36 PM Thomas Gleixner wrote:
>
> On Fri, 5 Jul 2019, Andy Lutomirski wrote:
> > On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper
> > wrote:
> > > Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
> > > without an er
On Fri, Jul 5, 2019 at 1:25 PM Thomas Gleixner wrote:
>
> Andrew,
>
> >
> > These can be addressed by setting TPR to 0x10, which will inhibit
>
> Right, that's easy and obvious.
>
This boots:
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 177aa8ef2afa..5257c40bde6c
On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper wrote:
>
> On 04/07/2019 16:51, Thomas Gleixner wrote:
> > 2) The loop termination logic is interesting at best.
> >
> > If the machine has no TSC or cpu_khz is not known yet it tries 1
> > million times to ack stale IRR/ISR bits. What?
> >
> On Jul 4, 2019, at 7:18 PM, Linus Torvalds
> wrote:
>
>> On Fri, Jul 5, 2019 at 5:03 AM Peter Zijlstra wrote:
>>
>> Despire the current efforts to read CR2 before tracing happens there
>> still exist a number of possible holes:
>
> So this whole series disturbs me for the simple reason
On Thu, Jul 4, 2019 at 1:03 PM Peter Zijlstra wrote:
>
>
Acked-by: Andy Lutomirski
/* get error code */
> - movq$-1, ORIG_RAX(%rsp) /* no syscall to restart */
> - .else
> - xorl %esi, %esi /* no error code */
> + idtentry_part \do_sym, \has_error_code, 0
Nice! You are adding an extra UNWIND_HINT_REGS that wasn't here
before, but I think that's fine. However, can you pleace make it
paranoid=0 instead of just 0? You could go all the way verbose and
say do_sym=\do_sym, etc, but that seems like overkill.
Other than that nitpick, Acked-by: Andy Lutomirski
--Andy
On Thu, Jul 4, 2019 at 1:03 PM Peter Zijlstra wrote:
>
> By adding one more option to SAVE_ALL we can make use of it in
> common_exception and simplify things. This saves duplication later
> where page_fault will no longer use common_exception.
>
Reviewed-by: Andy Lutomirski
Alt
On Thu, Jul 4, 2019 at 1:03 PM Peter Zijlstra wrote:
>
> The one paravirt read_cr2() implementation (Xen) is actually quite
> trivial and doesn't need to clobber anything other than the return
> register. By making read_cr2() CALLEE_SAVE we avoid all the PUSH/POP
> nonsense and allow more
On Wed, Jul 3, 2019 at 3:01 PM Peter Zijlstra wrote:
>
> On Wed, Jul 03, 2019 at 01:27:09PM -0700, Andy Lutomirski wrote:
> > On Wed, Jul 3, 2019 at 3:28 AM root wrote:
>
> > > @@ -1338,18 +1347,9 @@ ENTRY(error_entry)
> > > movq%rax, %rsp
A "compat" entry in the syscall tables means to use a different
entry on 32-bit and 64-bit builds. This only makes sense for
syscalls that exist in the first place in 32-bit builds, so disallow
it for anything other than i386.
Signed-off-by: Andy Lutomirski
---
arch/x86/entr
the compat vesions.
sendfile64() is more complicated, and I'll address it separately.
Signed-off-by: Andy Lutomirski
---
arch/x86/entry/syscalls/syscall_32.tbl | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl
b/arch/x86/entry/syscalls
that
need special handling on x32 can share the same number on x32 and
x86_64. This means that the special syscall range 512-547 can be
treated as a legacy wart instead of something that may need to be
extended in the future.
This patch also adds a selftest to verify the new behavior.
Signed-of
:)
Andy Lutomirski (4):
x86/syscalls: Use the compat versions of rt_sigsuspend() and
rt_sigprocmask()
x86/syscalls: Disallow compat entries for all types of 64-bit syscalls
x86/syscalls: Split the x32 syscalls into their own table
x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long
arch
to be. Syscall numbers
are, for all practical purposes, unsigned long, so make
__X32_SYSCALL_BIT be unsigned long.
Signed-off-by: Andy Lutomirski
---
arch/x86/include/uapi/asm/unistd.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/uapi/asm/unistd.h
b/arch/x86/include
On Wed, Jul 3, 2019 at 3:28 AM root wrote:
>
> Despire the current efforts to read CR2 before tracing happens there
> still exist a number of possible holes:
>
> idtentry page_fault do_page_fault has_error_code=1
> call error_entry
> TRACE_IRQS_OFF
> call
I don't know why mpx_mini_test tries to reprogram ftrace, but it
seems rude and it makes the test crash if run as non-root. Comment
it out.
Cc: Dave Hansen
Signed-off-by: Andy Lutomirski
---
tools/testing/selftests/x86/mpx-mini-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion
Commit-ID: 697096b1f458fb81212d1c82d7846e932455
Gitweb: https://git.kernel.org/tip/697096b1f458fb81212d1c82d7846e932455
Author: Andy Lutomirski
AuthorDate: Tue, 2 Jul 2019 20:43:04 -0700
Committer: Thomas Gleixner
CommitDate: Wed, 3 Jul 2019 16:24:56 +0200
selftests/x86
Andi Kleen
Cc: H. Peter Anvin
Cc: "BaeChang Seok"
Signed-off-by: Andy Lutomirski
---
Changes from v1:
- Fix one more 0x7 (Chang)
- This time, I explicitly tested it with modify_ldt() disabled
tools/testing/selftests/x86/fsgsbase.c | 74 ++
1 file changed,
: Andi Kleen
Cc: Ravi Shankar
Cc: H. Peter Anvin
Cc: "BaeChang Seok"
Signed-off-by: Andy Lutomirski
---
tools/testing/selftests/x86/fsgsbase.c | 72 ++
1 file changed, 39 insertions(+), 33 deletions(-)
diff --git a/tools/testing/selftests/x86/fsgsbase.c
Commit-ID: 539bca535decb11a0861b6205c6684b8e908589b
Gitweb: https://git.kernel.org/tip/539bca535decb11a0861b6205c6684b8e908589b
Author: Andy Lutomirski
AuthorDate: Mon, 1 Jul 2019 20:43:21 -0700
Committer: Thomas Gleixner
CommitDate: Tue, 2 Jul 2019 08:45:20 +0200
x86/entry/64: Fix
Commit-ID: dffb3f9db6b593f3ed6ab4c8d8f10e0aa6aa7a88
Gitweb: https://git.kernel.org/tip/dffb3f9db6b593f3ed6ab4c8d8f10e0aa6aa7a88
Author: Andy Lutomirski
AuthorDate: Mon, 1 Jul 2019 20:43:20 -0700
Committer: Thomas Gleixner
CommitDate: Tue, 2 Jul 2019 08:45:20 +0200
x86/entry/64: Don't
Commit-ID: 9402eaf4c11f0b892eda7b2bcb4654ab34ce34f9
Gitweb: https://git.kernel.org/tip/9402eaf4c11f0b892eda7b2bcb4654ab34ce34f9
Author: Andy Lutomirski
AuthorDate: Mon, 1 Jul 2019 20:43:19 -0700
Committer: Thomas Gleixner
CommitDate: Tue, 2 Jul 2019 08:45:20 +0200
selftests/x86: Test
On Mon, Jul 1, 2019 at 8:43 PM Andy Lutomirski wrote:
>
> In -tip, if FSGSBASE and PTI are on, the kernel crashes if SYSENTER
> happens with TF set. It also crashes under if a non-NMI paranoid
> entry happens for any other reason from kernel mode with user GSBASE
> and user CR3,
minutes while debugging this wondering whether I was
accidentally triggering ignore_sysret.
Andy Lutomirski (3):
selftests/x86: Test SYSCALL and SYSENTER manually with TF set
x86/entry/64: Don't compile ignore_sysret if 32-bit emulation is
enabled
x86/entry/64: Fix and clean up paranoid_exit
Kleen
Cc: Ravi Shankar
Cc: "Bae, Chang Seok"
Signed-off-by: Andy Lutomirski
---
arch/x86/entry/entry_64.S | 33 +
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 54b1b0468b2b.
It's only used if !CONFIG_IA32_EMULATION, so disable it in normal
configs. This will save a few bytes of text and reduce confusion.
Cc: "Bae, Chang Seok"
Signed-off-by: Andy Lutomirski
---
arch/x86/entry/entry_64.S | 6 ++
1 file changed, 6 insertions(+)
diff --git a/arch
kernel.org
Signed-off-by: Andy Lutomirski
---
tools/testing/selftests/x86/Makefile | 5 +-
.../testing/selftests/x86/syscall_arg_fault.c | 112 +-
2 files changed, 110 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/x86/Makefile
b/tools/testing/sel
#PF(0x%lx)\n",
> + printf("[FAIL]\tExecution failed with the wrong error:
> #PF(0x%lx)\n",
> segv_err);
> return 1;
> }
> --
> 2.20.1
>
Acked-by: Andy Lutomirski
On Fri, Jun 28, 2019 at 11:47 AM Matthew Garrett wrote:
>
> On Thu, Jun 27, 2019 at 4:27 PM Andy Lutomirski wrote:
> > They're really quite similar in my mind. Certainly some things in the
> > "integrity" category give absolutely trivial control over the kernel
>
On Fri, Jun 28, 2019 at 12:50 PM Yu-cheng Yu wrote:
>
> The IBT bitmap is visiable from user-mode, but not writable.
>
> Signed-off-by: Yu-cheng Yu
>
> ---
> arch/x86/mm/fault.c | 7 +++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index
1101 - 1200 of 19466 matches
Mail list logo