bpf: restrict unknown scalars of mixed signed bounds
> for unprivileged")
> Signed-off-by: Samuel Mendoza-Jonas
> Reviewed-by: Frank van der Linden
> Reviewed-by: Ethan Chen
> ---
Thanks for catching it :)
Reviewed-by: Balbir Singh
d.
>
> Signed-off-by: Chunguang Xu
The approach seems to make sense, but the test robot has found
a few issues, can you correct those as applicable please?
Balbir Singh.
flag(DELAYACCT_PF_SWAPIN);
> + delayacct_clear_flag(current, DELAYACCT_PF_SWAPIN);
> goto out_release;
> }
>
> locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
>
> - delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> + delayacct_clear_flag(current, DELAYACCT_PF_SWAPIN);
> if (!locked) {
> ret |= VM_FAULT_RETRY;
> goto out_release;
Acked-by: Balbir Singh
The changes seem reasonable to me. I don't maintain a git tree, Andrew can we
please queue them up in your tree?
Balbir Singh.
e the same fs.
>
> But again, this is a useful discussion to have, but I don't quite see
> why it's relevant to Muchun's patches. They're purely an optimization.
>
> So I'd like to clear that up first before going further.
>
I suspect a lot of the issue really is the lack of lockstepping
between a page (unmapped page cache) and the corresponding memcgroups
lifecycle. When we delete a memcgroup, we sort of lose accounting
(depending on the inheriting parent) and ideally we want to bring back
the accounting when the page is reused in a different cgroup (almost
like first touch). I would like to look at the patches and see if they
do solve the issue that leads to zombie cgroups hanging around. In my
experience,
the combination of namespaces and number of cgroups (several of which could
be zombies), does not scale well.
Balbir Singh.
On Mon, Mar 29, 2021 at 12:55:15PM +1100, Alistair Popple wrote:
> On Friday, 26 March 2021 4:15:36 PM AEDT Balbir Singh wrote:
> > On Fri, Mar 26, 2021 at 12:20:35PM +1100, Alistair Popple wrote:
> > > +static int __region_intersects(resource_size_t
>
> - if (dev)
> - res = devm_request_mem_region(dev, addr, size, name);
> - else
> - res = request_mem_region(addr, size, name);
> - if (!res)
> - return ERR_PTR(-ENOMEM);
> + if (!request_region_locked(_resource, res, addr,
> +size, name, 0))
> + break;
> +
> res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> + if (dev) {
> + dr->parent = _resource;
> + dr->start = addr;
> + dr->n = size;
> + devres_add(dev, dr);
> + }
> +
> + write_unlock(_lock);
> return res;
> }
>
> + write_unlock(_lock);
> + free_resource(res);
> +
> return ERR_PTR(-ERANGE);
> }
>
Balbir Singh.
ll code can be accumulated under a single
hierarchy. May not be worth the effort, just thinking out loud.
Balbir Singh
+
> >
> > assuming we have
> > #define VMEMMAP_END R_VMEMMAP_END
> > and ditto for hash we probably need
> >
> > BUILD_BUG_ON(VMEMMAP_END + KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
>
> Sorry, I'm not sure what this is supposed to be testing? In what
> situation would this trigger?
>
I am bit concerned that we have hard coded (IIR) 0xa80e... in the
config, any changes to VMEMMAP_END, KASAN_SHADOW_OFFSET/END
should be guarded.
Balbir Singh.
tions off after
> booting. Take this approach for now and require outline instrumentation.
>
> Previous attempts allowed inline instrumentation. However, they came with
> some unfortunate restrictions: only physically contiguous memory could be
> used and it had to be specified at com
On Fri, Mar 19, 2021 at 01:25:27AM +, Matthew Wilcox wrote:
> On Fri, Mar 19, 2021 at 10:56:45AM +1100, Balbir Singh wrote:
> > On Fri, Mar 05, 2021 at 04:18:37AM +, Matthew Wilcox (Oracle) wrote:
> > > A struct folio refers to an entire (possibly compound) page. A fu
KASAN_VMALLOC
> bool
>
> +config ARCH_DISABLE_KASAN_INLINE
> + def_bool n
> +
Some comments on what arch's want to disable kasan inline would
be helpful and why.
Balbir Singh.
/powerpc/kvm is instrumented. It's also potentially a bit
> fragile - if any real mode code paths call out to instrumented code, things
> will go boom.
>
The last time I checked, the changes for real mode, made the code hard to
review/maintain. I am happy to see that we've decided to leave that off
the table for now, reviewing the series
Balbir Singh.
he caller
> guarantees that the pointer it is passing does not point to a tail page.
>
Is this a part of a larger use case or general cleanup/refactor where
the split between page and folio simplify programming?
Balbir Singh.
On Thu, Feb 25, 2021 at 09:21:26PM +0800, Muchun Song wrote:
> Because we reuse the first tail vmemmap page frame and remap it
> with read-only, we cannot set the PageHWPosion on some tail pages.
> So we can use the head[4].private (There are at least 128 struct
> page structures associated with
On Thu, Feb 25, 2021 at 09:21:25PM +0800, Muchun Song wrote:
> When we free a HugeTLB page to the buddy allocator, we should allocate
> the vmemmap pages associated with it. But we may cannot allocate vmemmap
> pages when the system is under memory pressure, in this case, we just
> refuse to free
>
> Signed-off-by: Muchun Song
> Reviewed-by: Oscar Salvador
> Acked-by: Mike Kravetz
> Reviewed-by: Miaohe Lin
> ---
Reviewed-by: Balbir Singh
c
> > @@ -0,0 +1,124 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * linux/mm/bootmem_info.c
> > + *
> > + * Copyright (C)
>
> Looks like incomplete
>
Not that my comment was, I should have said
The copyright looks very incomplete
Balbir Singh.
+---+ | |
> > > | |
> > > | | | 4 | + |
> > > | |
> > > |2MB| +---+ |
> > > | |
> > > | | | 5 | --+
> > > | |
> > > | | +---+
> > > | |
> > > | | | 6 |
> > > + |
> > > | | +---+
> > > |
> > > | | | 7 |
> > > --+
> > > | | +---+
> > > | |
> > > | |
> > > | |
> > > +---+
> > >
> > > When a HugeTLB is freed to the buddy system, we should allocate 6 pages
> > > for
> > > vmemmap pages and restore the previous mapping relationship.
> > >
> >
> > Can these 6 pages come from the hugeTLB page itself? When you say 6 pages,
> > I presume you mean 6 pages of PAGE_SIZE
>
> There was a decent discussion about this in a previous version of the
> series starting here:
>
> https://lore.kernel.org/linux-mm/20210126092942.GA10602@linux/
>
> In this thread various other options were suggested and discussed.
>
Thanks,
Balbir Singh
ability to check the types of the parameters passed in and out makes it
not so good. Not to mention versioning issues, with the genl interface we have
the flexibility to version requests. I would really hate to have two ways to
do the same thing.
The overhead is there, do you see the overhead of 20ms per 10,000 calls
significant?
Does it affect your use case significantly?
Balbir Singh
On Fri, Feb 05, 2021 at 10:43:02AM +0800, Weiping Zhang wrote:
> On Fri, Feb 5, 2021 at 8:08 AM Balbir Singh wrote:
> >
> > On Thu, Feb 04, 2021 at 10:37:20PM +0800, Weiping Zhang wrote:
> > > On Thu, Feb 4, 2021 at 6:20 PM Balbir Singh wrote:
> > > >
> &
On Thu, Feb 04, 2021 at 10:37:20PM +0800, Weiping Zhang wrote:
> On Thu, Feb 4, 2021 at 6:20 PM Balbir Singh wrote:
> >
> > On Sun, Jan 31, 2021 at 05:16:47PM +0800, Weiping Zhang wrote:
> > > On Wed, Jan 27, 2021 at 7:13 PM Balbir Singh
> > > wrote:
> >
On Sun, Jan 31, 2021 at 05:16:47PM +0800, Weiping Zhang wrote:
> On Wed, Jan 27, 2021 at 7:13 PM Balbir Singh wrote:
> >
> > On Fri, Jan 22, 2021 at 10:07:50PM +0800, Weiping Zhang wrote:
> > > Hello Balbir Singh,
> > >
> > > Could you help review thi
On Fri, Jan 22, 2021 at 10:07:50PM +0800, Weiping Zhang wrote:
> Hello Balbir Singh,
>
> Could you help review this patch, thanks
>
> On Mon, Dec 28, 2020 at 10:10 PM Weiping Zhang wrote:
> >
> > Hi David,
> >
> > Could you help review this patch ?
>
On Mon, Dec 28, 2020 at 10:10:03PM +0800, Weiping Zhang wrote:
> Hi David,
>
> Could you help review this patch ?
>
> thanks
I've got it on my review list, thanks for the ping!
You should hear back from me soon.
Balbir Singh.
>
> On Fri, Dec 18, 2020 at 1:24 AM W
-by: Balbir Singh
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20200510014803.12190-4-sbl...@amazon.com
Link: https://lore.kernel.org/r/20200729001103.6450-3-sbl...@amazon.com
---
arch/x86/include/asm/tlbflush.h | 2 +-
arch/x86/mm/tlb.c | 53
is called only when HW assisted
flushing is available.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20200729001103.6450-4-sbl...@amazon.com
---
arch/x86/include/asm/cacheflush.h | 8
arch/x86/include/asm
).
There is also no seccomp integration for the feature.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
---
arch/Kconfig | 4 ++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/nospec-branch.h | 2 +
arch/x86
.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
---
arch/x86/include/asm/processor.h | 2 ++
arch/x86/kernel/smpboot.c| 10 +-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index
Add documentation of l1d flushing, explain the need for the
feature and how it can be used.
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/l1d_flush.rst | 70 +++
.../admin
at boot time, second
by the application
- Rename l1d_flush_out/L1D_FLUSH_OUT to l1d_flush/L1D_FLUSH
- Implement other review recommendations
Changelog v3:
- Implement the SIGBUS mechansim
- Update and fix the documentation
Balbir Singh (5):
x86/smp: Add a per-cpu view of SMT state
x86/mm
On Fri, Dec 04, 2020 at 11:19:17PM +0100, Thomas Gleixner wrote:
>
> Balbir,
>
> On Fri, Nov 27 2020 at 17:59, Balbir Singh wrote:
> > +enum l1d_flush_out_mitigations {
> > + L1D_FLUSH_OUT_OFF,
> > + L1D_FLUSH_OUT_ON,
> > +};
> > +
than just trace_printk()
Balbir Singh.
Same with
> that thread stuff.
>
> All this API stuff here is a complete and utter trainwreck. Please just
> delete the patches and start over. Hint: if you use stop_machine(),
> you're doing it wrong.
>
> At best you now have the requirements sorted.
+1, just remove this patch from the series so as to unblock the series.
Balbir Singh.
next())) {
> - p->core_cookie = !!val ? (unsigned long)tg : 0UL;
> -
> - if (sched_core_enqueued(p)) {
> - sched_core_dequeue(task_rq(p), p);
> - if (!p->core_cookie)
> - continue;
> - }
> -
> - if (sched_core_enabled(task_rq(p)) &&
> - p->core_cookie && task_on_rq_queued(p))
> - sched_core_enqueue(task_rq(p), p);
> + unsigned long cookie = !!val ? (unsigned long)tg : 0UL;
>
> + sched_core_tag_requeue(p, cookie, true /* group */);
> }
> css_task_iter_end();
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 60a922d3f46f..8c452b8010ad 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -1024,6 +1024,10 @@ void proc_sched_show_task(struct task_struct *p,
> struct pid_namespace *ns,
> __PS("clock-delta", t1-t0);
> }
>
> +#ifdef CONFIG_SCHED_CORE
> + __PS("core_cookie", p->core_cookie);
> +#endif
> +
> sched_show_numa(p, m);
> }
>
Balbir Singh.
On Thu, Nov 26, 2020 at 05:26:31PM +0800, Li, Aubrey wrote:
> On 2020/11/26 16:32, Balbir Singh wrote:
> > On Thu, Nov 26, 2020 at 11:20:41AM +0800, Li, Aubrey wrote:
> >> On 2020/11/26 6:57, Balbir Singh wrote:
> >>> On Wed, Nov 25, 2020 at 11:12:53AM +0800, Li, Aubr
to run these patches for testing? Bochs emulation or anything
else? I presume you've been testing against violations of CET in user space?
Can you share your testing?
Balbir Singh.
Add documentation of l1d flushing, explain the need for the
feature and how it can be used.
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/l1d_flush.rst | 69 +++
.../admin
is called only when HW assisted
flushing is available.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20200729001103.6450-4-sbl...@amazon.com
---
arch/x86/include/asm/cacheflush.h | 8
arch/x86/include/asm
).
There is also no seccomp integration for the feature.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
---
arch/Kconfig | 4 +++
arch/x86/Kconfig | 1 +
arch/x86/kernel/cpu/bugs.c | 54
-by: Balbir Singh
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20200510014803.12190-4-sbl...@amazon.com
Link: https://lore.kernel.org/r/20200729001103.6450-3-sbl...@amazon.com
---
arch/x86/include/asm/tlbflush.h | 2 +-
arch/x86/mm/tlb.c | 53
Detection of task affinities at API opt-in time is not the best
approach, the approach is to kill the task if it runs on a
SMT enable core. This is better than not flushing the L1D cache
when the task switches from a non-SMT core to an SMT enabled core.
Signed-off-by: Balbir Singh
---
arch/x86
-data-sampling
[3] https://lkml.org/lkml/2020/6/2/1150
[4] https://lore.kernel.org/lkml/20200729001103.6450-1-sbl...@amazon.com/
[5] https://lore.kernel.org/lkml/20201117234934.25985-2-sbl...@amazon.com/
Changelog v3:
- Implement the SIGBUS mechansim
- Update and fix the documentation
Balbir Singh
On Thu, Nov 26, 2020 at 09:29:14AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 26, 2020 at 10:05:19AM +1100, Balbir Singh wrote:
> > > @@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct
> > > *prev, struct rq_flags *rf)
> > >
On Thu, Nov 26, 2020 at 11:20:41AM +0800, Li, Aubrey wrote:
> On 2020/11/26 6:57, Balbir Singh wrote:
> > On Wed, Nov 25, 2020 at 11:12:53AM +0800, Li, Aubrey wrote:
> >> On 2020/11/24 23:42, Peter Zijlstra wrote:
> >>> On Mon, Nov 23, 2020 at 12:36:10PM +0800, Li,
ecause we are
> + * still in it on this CPU.
> + */
> + nest = rq->core->core_unsafe_nest;
> + WARN_ON_ONCE(!nest);
> +
> + WRITE_ONCE(rq->core->core_unsafe_nest, nest - 1);
> + /*
> + * The raw_spin_unlock release semantics pairs with the nest counter's
> + * smp_load_acquire() in sched_core_wait_till_safe().
> + */
> + raw_spin_unlock(rq_lockp(rq));
> +ret:
> + local_irq_restore(flags);
> +}
> +
> // XXX fairness/fwd progress conditions
> /*
> * Returns
> @@ -5497,6 +5737,7 @@ static inline void sched_core_cpu_starting(unsigned int
> cpu)
> rq = cpu_rq(i);
> if (rq->core && rq->core == rq)
> core_rq = rq;
> + init_sched_core_irq_work(rq);
> }
>
> if (!core_rq)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 615092cb693c..be6691337bbb 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1074,6 +1074,8 @@ struct rq {
> unsigned intcore_enabled;
> unsigned intcore_sched_seq;
> struct rb_root core_tree;
> + struct irq_work core_irq_work; /* To force HT into kernel */
> + unsigned intcore_this_unsafe_nest;
>
> /* shared state */
> unsigned intcore_task_seq;
> @@ -1081,6 +1083,7 @@ struct rq {
> unsigned long core_cookie;
> unsigned char core_forceidle;
> unsigned intcore_forceidle_seq;
> + unsigned intcore_unsafe_nest;
> #endif
> };
>
Balbir Singh.
On Tue, Nov 24, 2020 at 09:16:17AM +0100, Peter Zijlstra wrote:
> On Sun, Nov 22, 2020 at 08:11:52PM +1100, Balbir Singh wrote:
> > On Tue, Nov 17, 2020 at 06:19:34PM -0500, Joel Fernandes (Google) wrote:
> > > From: Peter Zijlstra
> > >
> > > Introduce the
CONFIG_ thing). Even on AMD systems RT tasks might want to claim the
> core exclusively.
Agreed, specifically if we need to have special cgroup tag/association to
enable it.
Balbir Singh.
On Fri, Nov 20, 2020 at 11:58:54AM -0500, Joel Fernandes wrote:
> On Fri, Nov 20, 2020 at 10:56:09AM +1100, Singh, Balbir wrote:
> [..]
> > > +#ifdef CONFIG_SMP
> > > +static struct task_struct *pick_task_fair(struct rq *rq)
> > > +{
> > > + struct cfs_rq *cfs_rq = >cfs;
> > > + struct
On Tue, Nov 24, 2020 at 10:09:55AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 24, 2020 at 10:31:49AM +1100, Balbir Singh wrote:
> > On Mon, Nov 23, 2020 at 07:31:31AM -0500, Vineeth Pillai wrote:
> > > Hi Balbir,
> > >
> > > On 11/22/20 6:44 AM, Balbir Sing
On Tue, Nov 24, 2020 at 01:30:38PM -0500, Joel Fernandes wrote:
> On Mon, Nov 23, 2020 at 09:41:23AM +1100, Balbir Singh wrote:
> > On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote:
> > > From: Peter Zijlstra
> > >
> > > The rationale
On Wed, Nov 25, 2020 at 11:12:53AM +0800, Li, Aubrey wrote:
> On 2020/11/24 23:42, Peter Zijlstra wrote:
> > On Mon, Nov 23, 2020 at 12:36:10PM +0800, Li, Aubrey wrote:
> +#ifdef CONFIG_SCHED_CORE
> +/*
> + * Skip this cpu if source task's cookie does
On Tue, Nov 24, 2020 at 08:32:01AM +0800, Li, Aubrey wrote:
> On 2020/11/24 7:35, Balbir Singh wrote:
> > On Mon, Nov 23, 2020 at 11:07:27PM +0800, Li, Aubrey wrote:
> >> On 2020/11/23 12:38, Balbir Singh wrote:
> >>> On Tue, Nov 17, 2020 at 06:19:43PM -0500,
On Mon, Nov 23, 2020 at 11:07:27PM +0800, Li, Aubrey wrote:
> On 2020/11/23 12:38, Balbir Singh wrote:
> > On Tue, Nov 17, 2020 at 06:19:43PM -0500, Joel Fernandes (Google) wrote:
> >> From: Peter Zijlstra
> >>
> >> When a sibling is forced-idle to match the c
On Mon, Nov 23, 2020 at 07:31:31AM -0500, Vineeth Pillai wrote:
> Hi Balbir,
>
> On 11/22/20 6:44 AM, Balbir Singh wrote:
> >
> > This seems cumbersome, is there no way to track the min_vruntime via
> > rq->core->min_vruntime?
> Do you mean to have a core w
> put_prev_task before calling pick_task_fair. But for coresched, we
> call pick_task_fair on siblings while the task is running and would
> not be able to call put_prev_task. So this refactor of the code fixes
> the crash by explicitly passing curr.
>
> Hope this clarifies..
>
Yes, it does!
Thanks,
Balbir Singh.
d by the series to determine if waiting is
> needed or not, during exit to user or guest mode.
>
> Tested-by: Julien Desfossez
> Reviewed-by: Aubrey Li
> Signed-off-by: Joel Fernandes (Google)
> ---
Acked-by: Balbir Singh
presume we are looking at either one or two cpus
to define the core_occupation and we expect to match it against the
destination CPU.
Balbir Singh.
return true;
> +
> + for_each_cpu(cpu, cpu_smt_mask(cpu_of(rq))) {
> + if (!available_idle_cpu(cpu)) {
I was looking at this snippet and comparing this to is_core_idle(), the
major difference is the check for vcpu_is_preempted(). Do we want to
call the core as non idle if any vcpu was preempted on this CPU?
> + idle_core = false;
> + break;
> + }
> + }
> +
> + /*
> + * A CPU in an idle core is always the best choice for tasks with
> + * cookies.
> + */
> + return idle_core || rq->core->core_cookie == p->core_cookie;
> +}
> +
Balbir Singh.
On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote:
> From: Peter Zijlstra
>
> The rationale is as follows. In the core-wide pick logic, even if
> need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> see if they could be running RT.
>
> Say the RQs in
easier. Further, it may make reverting the improvement easier in
> case the improvement causes any regression.
>
This seems cumbersome, is there no way to track the min_vruntime via
rq->core->min_vruntime?
Balbir Singh.
P */
>
> +#ifdef CONFIG_SCHED_CORE
> +static inline bool
> +__entity_slice_used(struct sched_entity *se, int min_nr_tasks)
> +{
> + u64 slice = sched_slice(cfs_rq_of(se), se);
I wonder if the definition of sched_slice() should be revisited for core
scheduling?
Should we use sched_slice = sched_slice / cpumask_weight(smt_mask)?
Would that resolve the issue your seeing? Effectively we need to answer
if two sched core siblings should be treated as executing one large
slice?
Balbir Singh.
Is it possible to have some
cores with core sched disabled? I don't see a strong use case for it,
but I am wondering if the design will fall apart if that assumption is
broken?
Balbir Singh
dynamic based on whether core sched is enabled or not (both statically and
> dynamically).
>
My point was that the word game does not do justice to the change, some
details around how this abstractions helps based on the (re)definition of rq
with coresched might help.
Balbir Singh.
Update the documentation to mention that a SIGBUS will be sent
to tasks that opt-into L1D flushing and execute on non-SMT cores.
Signed-off-by: Balbir Singh
---
To be applied on top of tip commit id
767d46ab566dd489733666efe48732d523c8c332
Documentation/admin-guide/hw-vuln/l1d_flush.rst | 8
Add a label to spec_set_ctrl to remove the build warning.
Signed-off-by: Balbir Singh
---
To be applied on top of tip commit id
767d46ab566dd489733666efe48732d523c8c332
Documentation/admin-guide/hw-vuln/l1d_flush.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git
Detection of task affinities at API opt-in time is not the best
approach, the approach is to kill the task if it runs on a
SMT enable core. This is better than not flushing the L1D cache
when the task switches from a non-SMT core to an SMT enabled core.
Signed-off-by: Balbir Singh
the SIGBUS behaviour,
there needs to be contention on the CPU where the task that opts
into L1D flushing is running to see the SIGBUS being sent to it
(the deterministic bit is that if there is scope of data leak
the task will get killed)
Balbir Singh (3):
x86/mm: change l1d flush runtime prctl
On 11/16/20 9:23 AM, Balbir Singh wrote:
>
>
> On 10/28/20 2:23 PM, Stephen Rothwell wrote:
>> Hi all,
>>
>> After merging the tip tree, today's linux-next build (htmldocs) produced
>> this warning:
>>
>> Documentation/admin-guide/hw-vuln/l
.rst (if the link has no caption the
> label must precede a section header)
>
> Introduced by commit
>
> 767d46ab566d ("Documentation: Add L1D flushing Documentation")
>
Looking at it thanks, I am no expert with sphinx, but it seems like I need
angular braces around the link
Balbir Singh.
The following commit has been merged into the x86/pti branch of tip:
Commit-ID: b6724f118d44606fddde391ba7527526b3cad211
Gitweb:
https://git.kernel.org/tip/b6724f118d44606fddde391ba7527526b3cad211
Author:Balbir Singh
AuthorDate:Wed, 29 Jul 2020 10:11:02 +10:00
Committer
The following commit has been merged into the x86/pti branch of tip:
Commit-ID: a9210620ec360f7375282ff1d35c8f8016ccc986
Gitweb:
https://git.kernel.org/tip/a9210620ec360f7375282ff1d35c8f8016ccc986
Author:Balbir Singh
AuthorDate:Wed, 29 Jul 2020 10:11:01 +10:00
Committer
The following commit has been merged into the x86/pti branch of tip:
Commit-ID: 81f449985c12b83b91849d94724b803ebf856301
Gitweb:
https://git.kernel.org/tip/81f449985c12b83b91849d94724b803ebf856301
Author:Balbir Singh
AuthorDate:Wed, 29 Jul 2020 10:11:00 +10:00
Committer
The following commit has been merged into the x86/pti branch of tip:
Commit-ID: 0a260b1c5867863121b044afa8087d6b37e4fb7d
Gitweb:
https://git.kernel.org/tip/0a260b1c5867863121b044afa8087d6b37e4fb7d
Author:Balbir Singh
AuthorDate:Wed, 29 Jul 2020 10:10:59 +10:00
Committer
The following commit has been merged into the x86/pti branch of tip:
Commit-ID: 767d46ab566dd489733666efe48732d523c8c332
Gitweb:
https://git.kernel.org/tip/767d46ab566dd489733666efe48732d523c8c332
Author:Balbir Singh
AuthorDate:Wed, 29 Jul 2020 10:11:03 +10:00
Committer
.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
---
arch/x86/include/asm/processor.h | 2 ++
arch/x86/kernel/smpboot.c| 11 ++-
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index
).
Balbir Singh (5):
Add a per-cpu view of SMT state
x86/mm: Refactor cond_ibpb() to support other use cases
x86/mm: Optionally flush L1D on context switch
prctl: Hook L1D flushing in via prctl
Documentation: Add L1D flushing Documentation
References:
[1]
https://software.intel.com/security
Use the existing PR_GET/SET_SPECULATION_CTRL API to expose the L1D
flush capability. For L1D flushing PR_SPEC_FORCE_DISABLE and
PR_SPEC_DISABLE_NOEXEC are not supported.
There is also no seccomp integration for the feature.
Signed-off-by: Balbir Singh
---
arch/x86/kernel/cpu/bugs.c | 54
Add documentation of l1d flushing, explain the need for the
feature and how it can be used.
Signed-off-by: Balbir Singh
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/l1d_flush.rst | 70 +++
.../admin-guide/kernel-parameters.txt
-by: Balbir Singh
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20200510014803.12190-4-sbl...@amazon.com
---
arch/x86/include/asm/tlbflush.h | 2 +-
arch/x86/mm/tlb.c | 53 ++---
2 files changed, 30 insertions(+), 25 deletions(-)
diff --git
is called only when HW assisted
flushing is available.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
---
arch/x86/include/asm/cacheflush.h | 8
arch/x86/include/asm/thread_info.h | 9 +++--
arch/x86/mm/tlb.c | 30 +++---
3 files
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 0fcfdf55db9e1ecf85edd6aa8d0bc78a448cb96a
Gitweb:
https://git.kernel.org/tip/0fcfdf55db9e1ecf85edd6aa8d0bc78a448cb96a
Author:Balbir Singh
AuthorDate:Sat, 16 May 2020 20:34:30 +10:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: e3efae20ec69e9a8c9db1ad81b37de629219bbc4
Gitweb:
https://git.kernel.org/tip/e3efae20ec69e9a8c9db1ad81b37de629219bbc4
Author:Balbir Singh
AuthorDate:Sun, 10 May 2020 11:47:59 +10:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 20fc9f6f9f2fefefb694c9e447f80b4772021e0a
Gitweb:
https://git.kernel.org/tip/20fc9f6f9f2fefefb694c9e447f80b4772021e0a
Author:Balbir Singh
AuthorDate:Sat, 16 May 2020 20:34:28 +10:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: edf7ce0b231cb6cdc170125588cf71c70358fc74
Gitweb:
https://git.kernel.org/tip/edf7ce0b231cb6cdc170125588cf71c70358fc74
Author:Balbir Singh
AuthorDate:Sat, 16 May 2020 20:34:29 +10:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: b9b3bc1c30be1f056c1c0564bc7268820ea8bf70
Gitweb:
https://git.kernel.org/tip/b9b3bc1c30be1f056c1c0564bc7268820ea8bf70
Author:Balbir Singh
AuthorDate:Sun, 10 May 2020 11:47:58 +10:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 3f768f0032dbc0657ed7e48f4735a3c4e49e25d7
Gitweb:
https://git.kernel.org/tip/3f768f0032dbc0657ed7e48f4735a3c4e49e25d7
Author:Balbir Singh
AuthorDate:Sun, 10 May 2020 11:48:01 +10:00
Committer
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 83ce56f712af79eac5f761e6b058359336803500
Gitweb:
https://git.kernel.org/tip/83ce56f712af79eac5f761e6b058359336803500
Author:Balbir Singh
AuthorDate:Sun, 10 May 2020 11:48:00 +10:00
Committer
S);
> if (!PageHuge(new))
> __inc_node_page_state(new, NR_FILE_PAGES);
> if (PageSwapBacked(old))
> - __dec_node_page_state(new, NR_SHMEM);
> + __dec_node_page_state(old, NR_SHMEM);
> if (PageSwapBacked(new))
> __inc_node_page_state(new, NR_SHMEM);
> xas_unlock_irqrestore(, flags);
Reviewed-by: Balbir Singh
On 14/5/20 10:00 pm, Matthew Wilcox wrote:
> On Thu, May 14, 2020 at 09:00:40PM +1000, Balbir Singh wrote:
>> I wonder if the right thing to do is also to disable pre-emption, just so
>> that the thread does not linger on with sensitive data.
>>
>> void kvfree
decide when to flush the
L1D cache.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
---
arch/x86/include/asm/thread_info.h | 9 +--
arch/x86/mm/tlb.c | 39 +++---
2 files changed, 43 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include
Add documentation of l1d flushing, explain the need for the
feature and how it can be used.
[tglx: Reword the documentation]
Signed-off-by: Thomas Gleixner
Signed-off-by: Balbir Singh
Reviewed-by: Kees Cook
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln
Use the existing PR_GET/SET_SPECULATION_CTRL API to expose the L1D
flush capability. For L1D flushing PR_SPEC_FORCE_DISABLE and
PR_SPEC_DISABLE_NOEXEC are not supported.
There is also no seccomp integration for the feature.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
---
arch
- Reuse the prctl PR_GET/SET_SPECULATION_CTRL with PR_SPEC_L1D_FLUSH_OUT
as the ctrl parameter
- Add the BUILD_BUG_ON that went missing that checks the placement of
TIF_SPEC_L1D_FLUSH
- Update the documentation to reflect the changes
Balbir Singh (3):
x86/mm: Optionally flush L1D on context
e, it's just much simpler code.
Makes sense! Thanks, I should spend some time to re-read all of the
memcontrol.c code :)
Balbir Singh.
}
> +}
> +EXPORT_SYMBOL(kvfree_sensitive);
> +
I wonder if the right thing to do is also to disable pre-emption, just so that
the thread does not linger on with sensitive data.
void kvfree_sensitive(const void *addr, size_t len)
{
preempt_disable();
if (likely(!ZERO_OR_NULL_PTR(addr))) {
memzero_explicit((void *)addr, len);
kvfree(addr);
}
preempt_enable();
}
EXPORT_SYMBOL(kvfree_sensitive);
Balbir Singh.
c vmstat counters, the charge sequence must be
> adjusted such that page->mem_cgroup is set up by the time these
> counters are modified.
>
> The series is structured as follows:
>
> 1. Bug fixes
> 2. Decoupling charging from rmap
> 3. Swap controller integration into memcg
> 4. Direct swapin charging
>
Thanks,
Balbir Singh.
abstracted under arch_l1d_flush().
vmx_l1d_flush_mutex however continues to exist as it also used
from other code paths.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
---
arch/x86/include/asm/cacheflush.h | 12 +++---
arch/x86/kernel/l1d_flush.c | 64
Refactor the existing assembly bits into smaller helper functions
and also abstract L1D_FLUSH into a helper function. Use these
functions in kvm for L1D flushing.
Reviewed-by: Kees Cook
Signed-off-by: Balbir Singh
---
arch/x86/include/asm/cacheflush.h | 3 ++
arch/x86/kernel/l1d_flush.c
Add documentation of l1d flushing, explain the need for the
feature and how it can be used.
Signed-off-by: Balbir Singh
Reviewed-by: Kees Cook
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/l1d_flush.rst | 40 +++
2 files changed, 41
Split out the allocation and free routines to be used in a follow
up set of patches (to reuse for L1D flushing).
Signed-off-by: Balbir Singh
Reviewed-by: Kees Cook
---
arch/x86/include/asm/cacheflush.h | 3 +++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/l1d_flush.c | 36
1 - 100 of 2047 matches
Mail list logo