Re: [PATCH] locking: remove spin_lock_flags() etc

2021-10-25 Thread Peter Zijlstra
On Sat, Oct 23, 2021 at 06:04:57PM +0200, Arnd Bergmann wrote: > On Sat, Oct 23, 2021 at 3:37 AM Waiman Long wrote: > >> On 10/22/21 7:59 AM, Arnd Bergmann wrote: > > > From: Arnd Bergmann > > > > > > As this is all dead code, just remove it and the helper functions built > > > around it. For arc

Re: [PATCH v3 0/4] Add mem_hops field in perf_mem_data_src structure

2021-10-18 Thread Peter Zijlstra
On Mon, Oct 18, 2021 at 02:46:18PM +1100, Michael Ellerman wrote: > Peter Zijlstra writes: > > On Wed, Oct 06, 2021 at 07:36:50PM +0530, Kajol Jain wrote: > > > >> Kajol Jain (4): > >> perf: Add comment about current state of PERF_MEM_LVL_* namespace

Re: [PATCH 2/2] sched: Centralize SCHED_{SMT, MC, CLUSTER} definitions

2021-10-15 Thread Peter Zijlstra
On Fri, Oct 08, 2021 at 04:22:27PM +0100, Valentin Schneider wrote: > So x86 has it default yes, and a lot of others (e.g. arm64) have it default > no. > > IMO you don't gain much by disabling them. SCHED_MC and SCHED_CLUSTER only > control the presence of a sched_domain_topology_level - if it's

Re: [PATCH] tracing: Have all levels of checks prevent recursion

2021-10-15 Thread Peter Zijlstra
On Fri, Oct 15, 2021 at 02:20:33PM -0400, Steven Rostedt wrote: > On Fri, 15 Oct 2021 20:04:29 +0200 > Peter Zijlstra wrote: > > > On Fri, Oct 15, 2021 at 01:58:06PM -0400, Steven Rostedt wrote: > > > Something like this: > > > > I think having one cop

Re: [PATCH] tracing: Have all levels of checks prevent recursion

2021-10-15 Thread Peter Zijlstra
On Fri, Oct 15, 2021 at 01:58:06PM -0400, Steven Rostedt wrote: > Something like this: I think having one copy of that in a header is better than having 3 copies. But yes, something along them lines.

Re: [PATCH] tracing: Have all levels of checks prevent recursion

2021-10-15 Thread Peter Zijlstra
On Fri, Oct 15, 2021 at 11:00:35AM -0400, Steven Rostedt wrote: > From: "Steven Rostedt (VMware)" > > While writing an email explaining the "bit = 0" logic for a discussion on > bit = trace_get_context_bit() + start; While there, you were also going to update that function to match/use ge

Re: [PATCH 2/2] ftrace: prevent preemption in perf_ftrace_function_call()

2021-10-12 Thread Peter Zijlstra
On Tue, Oct 12, 2021 at 01:40:31PM +0800, 王贇 wrote: > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c > index 6aed10e..33c2f76 100644 > --- a/kernel/trace/trace_event_perf.c > +++ b/kernel/trace/trace_event_perf.c > @@ -441,12 +441,19 @@ void perf_trace_buf_update(vo

Re: [PATCH v3 0/4] Add mem_hops field in perf_mem_data_src structure

2021-10-06 Thread Peter Zijlstra
-- > tools/include/uapi/linux/perf_event.h | 19 --- > tools/perf/util/mem-events.c | 20 ++-- > 5 files changed, 73 insertions(+), 13 deletions(-) Acked-by: Peter Zijlstra (Intel) How do we want this routed? Shall I take it, or does Michael want it in the Power tree?

Re: [PATCH 2/4] perf: Add mem_hops field in perf_mem_data_src structure

2021-10-05 Thread Peter Zijlstra
On Tue, Oct 05, 2021 at 02:48:35PM +0530, Kajol Jain wrote: > Going forward, future generation systems can have more hierarchy > within the chip/package level but currently we don't have any data source > encoding field in perf, which can be used to represent this level of data. > > Add a new fiel

Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance

2021-09-17 Thread Peter Zijlstra
Neri Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Joel Fernandes (Google) Reviewed-by: Len Brown Link: https://lkml.kernel.org/r/20210911011819.12184-7-ricardo.neri-calde...@linux.intel.com --- kernel/sched/fair.c | 92 1 file chang

Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses

2021-09-14 Thread Peter Zijlstra
On Tue, Sep 14, 2021 at 08:40:38PM +1000, Michael Ellerman wrote: > Peter Zijlstra writes: > > I'm thinking we ought to keep hops as steps along the NUMA fabric, with > > 0 hops being the local node. That only gets us: > > > > L2, remote=0, hops=HOPS_0 -- our L

Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses

2021-09-09 Thread Peter Zijlstra
On Thu, Sep 09, 2021 at 10:45:54PM +1000, Michael Ellerman wrote: > > The 'new' composite doesnt have a hops field because the hardware that > > nessecitated that change doesn't report it, but we could easily add a > > field there. > > > > Suppose we add, mem_hops:3 (would 6 hops be too small?) an

Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses

2021-09-08 Thread Peter Zijlstra
On Wed, Sep 08, 2021 at 05:17:53PM +1000, Michael Ellerman wrote: > Kajol Jain writes: > > diff --git a/include/uapi/linux/perf_event.h > > b/include/uapi/linux/perf_event.h > > index f92880a15645..030b3e990ac3 100644 > > --- a/include/uapi/linux/perf_event.h > > +++ b/include/uapi/linux/perf_ev

Re: [PATCH v3] powerpc/32: Add support for out-of-line static calls

2021-09-01 Thread Peter Zijlstra
you'd tried PREEMPT_DYNAMIC, since that should really stress the thing, but I see that also requires GENERIC_ENTRY and you don't have that. Alas. Acked-by: Peter Zijlstra (Intel)

Re: [PATCH v2] powerpc/32: Add support for out-of-line static calls

2021-08-31 Thread Peter Zijlstra
On Tue, Aug 31, 2021 at 01:12:26PM +, Christophe Leroy wrote: > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 36b72d972568..a0fe69d8ec83 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -247,6 +247,7 @@ config PPC > select HAVE_SOFTIRQ_ON_OWN_STACK

Re: [PATCH] powerpc/32: Add support for out-of-line static calls

2021-08-31 Thread Peter Zijlstra
On Tue, Aug 31, 2021 at 08:05:21AM +, Christophe Leroy wrote: > +#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) \ > + asm(".pushsection .text, \"ax\" \n" \ > + ".align 4 \n" \ > +

Re: [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance

2021-08-27 Thread Peter Zijlstra
On Fri, Aug 27, 2021 at 12:13:42PM +0200, Vincent Guittot wrote: > > +/** > > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull > > tasks > > + * @dst_cpu: Destination CPU of the load balancing > > + * @sds: Load-balancing data with statistics of the local group >

Re: [RFC PATCH] powerpc: Investigate static_call concept

2021-08-27 Thread Peter Zijlstra
On Fri, Aug 27, 2021 at 09:45:37AM +, Christophe Leroy wrote: > This RFC is to validate the concept of static_call on powerpc. > > Highly copied from x86. > > It replaces ppc_md.get_irq() which is called at every IRQ, by > a static call. The code looks saner, but does it actually improve per

Re: [PATCH v2 0/3] Updates to powerpc for robust CPU online/offline

2021-08-23 Thread Peter Zijlstra
On Mon, Aug 23, 2021 at 03:04:37PM +0530, Srikar Dronamraju wrote: > * Peter Zijlstra [2021-08-23 10:33:30]: > > > On Sat, Aug 21, 2021 at 03:55:32PM +0530, Srikar Dronamraju wrote: > > > Scheduler expects unique number of node distances to be available > > > at

Re: [PATCH v2 0/3] Updates to powerpc for robust CPU online/offline

2021-08-23 Thread Peter Zijlstra
On Sat, Aug 21, 2021 at 03:55:32PM +0530, Srikar Dronamraju wrote: > Scheduler expects unique number of node distances to be available > at boot. It uses node distance to calculate this unique node > distances. On Power Servers, node distances for offline nodes is not > available. However, Power Se

Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE

2021-08-04 Thread Peter Zijlstra
On Wed, Aug 04, 2021 at 04:39:44PM +1000, Nicholas Piggin wrote: > For that matter, I wonder if we shouldn't do something like this > (untested) so the low level batch flush has visibility to the high > level flush range. > > x86 could use this too AFAIKS, just needs to pass the range a bit > fur

Re: PowerPC guest getting "BUG: scheduling while atomic" on linux-next-20210623 during secondary CPUs bringup

2021-06-25 Thread Peter Zijlstra
On Fri, Jun 25, 2021 at 02:23:16PM +0530, Bharata B Rao wrote: > On Fri, Jun 25, 2021 at 09:28:09AM +0200, Peter Zijlstra wrote: > > On Fri, Jun 25, 2021 at 11:16:08AM +0530, Srikar Dronamraju wrote: > > > * Bharata B Rao [2021-06-24 21:25:09]: > > > > >

Re: PowerPC guest getting "BUG: scheduling while atomic" on linux-next-20210623 during secondary CPUs bringup

2021-06-25 Thread Peter Zijlstra
On Fri, Jun 25, 2021 at 11:16:08AM +0530, Srikar Dronamraju wrote: > * Bharata B Rao [2021-06-24 21:25:09]: > > > A PowerPC KVM guest gets the following BUG message when booting > > linux-next-20210623: > > > > smp: Bringing up secondary CPUs ... > > BUG: scheduling while atomic: swapper/1/0/0x0

Re: [PATCH v3 0/4] Add perf interface to expose nvdimm

2021-06-23 Thread Peter Zijlstra
On Wed, Jun 23, 2021 at 01:40:38PM +0530, kajoljain wrote: > > > On 6/22/21 6:44 PM, Peter Zijlstra wrote: > > On Thu, Jun 17, 2021 at 06:56:13PM +0530, Kajol Jain wrote: > >> --- > >> Kajol Jain (4): > >> drivers/nvdimm: Add nvdimm pmu structure &g

Re: [PATCH v3 0/4] Add perf interface to expose nvdimm

2021-06-22 Thread Peter Zijlstra
ent papr_scm sysfs event format entries Don't see anything obviously wrong with this one. Acked-by: Peter Zijlstra (Intel)

Re: [PATCH 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats

2021-06-08 Thread Peter Zijlstra
On Tue, Jun 08, 2021 at 05:26:58PM +0530, Kajol Jain wrote: > +static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node) > +{ > + struct nvdimm_pmu *nd_pmu; > + u32 target; > + int nodeid; > + const struct cpumask *cpumask; > + > + nd_pmu = hlist_entry_safe(no

Re: [PATCH 1/3] sched/topology: Allow archs to populate distance map

2021-05-28 Thread Peter Zijlstra
On Mon, May 24, 2021 at 09:48:29PM +0530, Srikar Dronamraju wrote: > * Valentin Schneider [2021-05-24 15:16:09]: > > I suppose one way to avoid the hook would be to write some "fake" distance > > values into your distance_lookup_table[] for offline nodes using your > > distance_ref_point_depth th

Re: [RFC v2 4/4] powerpc/papr_scm: Add cpu hotplug support for nvdimm pmu device

2021-05-26 Thread Peter Zijlstra
On Wed, May 26, 2021 at 12:56:58PM +0530, kajoljain wrote: > On 5/25/21 7:46 PM, Peter Zijlstra wrote: > > On Tue, May 25, 2021 at 06:52:16PM +0530, Kajol Jain wrote: > >> It adds cpumask to designate a cpu to make HCALL to > >> collect the counter data for the nvdimm

Re: [RFC v2 4/4] powerpc/papr_scm: Add cpu hotplug support for nvdimm pmu device

2021-05-25 Thread Peter Zijlstra
On Tue, May 25, 2021 at 06:52:16PM +0530, Kajol Jain wrote: > Patch here adds cpu hotplug functions to nvdimm pmu. I'm thinking "Patch here" qualifies for "This patch", see Documentation/process/submitting-patches.rst . > It adds cpumask to designate a cpu to make HCALL to > collect the counter d

Re: [PATCH 1/3] sched/topology: Allow archs to populate distance map

2021-05-21 Thread Peter Zijlstra
On Fri, May 21, 2021 at 08:08:02AM +0530, Srikar Dronamraju wrote: > * Peter Zijlstra [2021-05-20 20:56:31]: > > > On Thu, May 20, 2021 at 09:14:25PM +0530, Srikar Dronamraju wrote: > > > Currently scheduler populates the distance map by looking at distance > > &g

Re: [PATCH 1/3] sched/topology: Allow archs to populate distance map

2021-05-20 Thread Peter Zijlstra
On Thu, May 20, 2021 at 09:14:25PM +0530, Srikar Dronamraju wrote: > Currently scheduler populates the distance map by looking at distance > of each node from all other nodes. This should work for most > architectures and platforms. > > However there are some architectures like POWER that may not

Re: [PATCH v3 5/6] sched/fair: Consider SMT in ASYM_PACKING load balance

2021-05-19 Thread Peter Zijlstra
On Tue, May 18, 2021 at 12:07:40PM -0700, Ricardo Neri wrote: > On Fri, May 14, 2021 at 07:14:15PM -0700, Ricardo Neri wrote: > > On Fri, May 14, 2021 at 11:47:45AM +0200, Peter Zijlstra wrote: > > > So I'm thinking that this is a property of having ASYM_PACKING at a core &

Re: [RFC 1/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats

2021-05-14 Thread Peter Zijlstra
On Thu, May 13, 2021 at 05:56:14PM +0530, kajoljain wrote: > But yes the current read/add/del functions are not adding value. We > could add an arch/platform specific function which could handle the > capturing of the counter data and do the rest of the operation here, > is this approach better?

Re: [RFC 1/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats

2021-05-12 Thread Peter Zijlstra
On Wed, May 12, 2021 at 10:08:21PM +0530, Kajol Jain wrote: > +static void nvdimm_pmu_read(struct perf_event *event) > +{ > + struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu); > + > + /* jump to arch/platform specific callbacks if any */ > + if (nd_pmu && nd_pmu->read) > +

Re: [RESEND PATCH v4 10/11] powerpc: Protect patching_mm with a lock

2021-05-07 Thread Peter Zijlstra
On Fri, May 07, 2021 at 03:03:51PM -0500, Christopher M. Riedl wrote: > On Thu May 6, 2021 at 5:51 AM CDT, Peter Zijlstra wrote: > > On Wed, May 05, 2021 at 11:34:51PM -0500, Christopher M. Riedl wrote: > > > Powerpc allows for multiple CPUs to patch concurrently. When p

Re: [RESEND PATCH v4 10/11] powerpc: Protect patching_mm with a lock

2021-05-06 Thread Peter Zijlstra
On Wed, May 05, 2021 at 11:34:51PM -0500, Christopher M. Riedl wrote: > Powerpc allows for multiple CPUs to patch concurrently. When patching > with STRICT_KERNEL_RWX a single patching_mm is allocated for use by all > CPUs for the few times that patching occurs. Use a spinlock to protect > the patc

Re: [OpenRISC] [PATCH v6 1/9] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

2021-04-07 Thread Peter Zijlstra
On Wed, Apr 07, 2021 at 08:52:08AM +0900, Stafford Horne wrote: > Why doesn't RISC-V add the xchg16 emulation code similar to OpenRISC? For > OpenRISC we added xchg16 and xchg8 emulation code to enable qspinlocks. So > one thought is with CONFIG_ARCH_USE_QUEUED_SPINLOCKS_XCHG32=y, can we remove

Re: [PATCH V2 1/5] powerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT

2021-03-25 Thread Peter Zijlstra
On Thu, Mar 25, 2021 at 10:01:35AM -0300, Arnaldo Carvalho de Melo wrote: > Em Wed, Mar 24, 2021 at 10:05:23AM +0530, Madhavan Srinivasan escreveu: > > > > On 3/22/21 8:27 PM, Athira Rajeev wrote: > > > Performance Monitoring Unit (PMU) registers in powerpc provides > > > information on cycles ela

Re: [PATCH V2] powerpc/perf: Fix handling of privilege level checks in perf interrupt context

2021-02-26 Thread Peter Zijlstra
r) != 0) > + is_kernel_addr(addr) && event->attr.exclude_kernel) > continue; > > /* Branches are read most recent first (ie. mfbhrb 0 is Acked-by: Peter Zijlstra (Intel)

Re: [PATCH] powerpc/perf: Fix handling of privilege level checks in perf interrupt context

2021-02-23 Thread Peter Zijlstra
On Tue, Feb 23, 2021 at 01:31:49AM -0500, Athira Rajeev wrote: > Running "perf mem record" in powerpc platforms with selinux enabled > resulted in soft lockup's. Below call-trace was seen in the logs: > > CPU: 58 PID: 3751 Comm: sssd_nss Not tainted 5.11.0-rc7+ #2 > NIP: c0dff3d4 LR: c000

Re: [RFC 11/20] mm/tlb: remove arch-specific tlb_start/end_vma()

2021-02-02 Thread Peter Zijlstra
On Tue, Feb 02, 2021 at 09:54:36AM +, Nadav Amit wrote: > > On Feb 2, 2021, at 1:31 AM, Peter Zijlstra wrote: > > > > On Tue, Feb 02, 2021 at 07:20:55AM +, Nadav Amit wrote: > >> Arm does not define tlb_end_vma, and consequently it flushes the TLB after > &g

Re: [RFC PATCH 3/6] mm/mremap: Use pmd/pud_poplulate to update page table entries

2021-02-02 Thread Peter Zijlstra
On Tue, Feb 02, 2021 at 02:41:13PM +0530, Aneesh Kumar K.V wrote: > pmd/pud_populate is the right interface to be used to set the respective > page table entries. Some architectures do assume that set_pmd/pud_at > can only be used to set a hugepage PTE. Since we are not setting up a hugepage > PTE

Re: [RFC 11/20] mm/tlb: remove arch-specific tlb_start/end_vma()

2021-02-02 Thread Peter Zijlstra
On Tue, Feb 02, 2021 at 07:20:55AM +, Nadav Amit wrote: > Arm does not define tlb_end_vma, and consequently it flushes the TLB after > each VMA. I suspect it is not intentional. ARM is one of those that look at the VM_EXEC bit to explicitly flush ITLB IIRC, so it has to.

Re: [RFC 00/20] TLB batching consolidation and enhancements

2021-02-01 Thread Peter Zijlstra
On Sun, Jan 31, 2021 at 07:57:01AM +, Nadav Amit wrote: > > On Jan 30, 2021, at 7:30 PM, Nicholas Piggin wrote: > > I'll go through the patches a bit more closely when they all come > > through. Sparc and powerpc of course need the arch lazy mode to get > > per-page/pte information for oper

Re: [RFC 11/20] mm/tlb: remove arch-specific tlb_start/end_vma()

2021-02-01 Thread Peter Zijlstra
On Sat, Jan 30, 2021 at 04:11:23PM -0800, Nadav Amit wrote: > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > index 427bfcc6cdec..b97136b7010b 100644 > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -334,8 +334,8 @@ static inline void __tlb_reset_range(

Re: [PATCH 0/3] sched: Task priority related cleanups

2021-01-29 Thread Peter Zijlstra
On Thu, Jan 28, 2021 at 02:10:37PM +0100, Dietmar Eggemann wrote: > Dietmar Eggemann (3): > sched: Remove MAX_USER_RT_PRIO > sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO > sched/core: Update task_prio() function header Thanks!

Re: [RFC please help] membarrier: Rewrite sync_core_before_usermode()

2021-01-05 Thread Peter Zijlstra
On Tue, Jan 05, 2021 at 08:20:51AM -0800, Andy Lutomirski wrote: > > Interestingly, the architecture recently added a control bit to remove > > this synchronisation from exception return, so if we set that then we'd > > have a problem with SYNC_CORE and adding an ISB would be necessary

Re: [RFC][PATCH 1/2] libnvdimm: Introduce ND_CMD_GET_STAT to retrieve nvdimm statistics

2020-12-08 Thread Peter Zijlstra
On Mon, Dec 07, 2020 at 04:54:21PM -0800, Dan Williams wrote: > [ add perf maintainers ] > > On Sun, Nov 8, 2020 at 1:16 PM Vaibhav Jain wrote: > > > > Implement support for exposing generic nvdimm statistics via newly > > introduced dimm-command ND_CMD_GET_STAT that can be handled by nvdimm > >

Re: [PATCH AUTOSEL 5.9 27/39] sched/idle: Fix arch_cpu_idle() vs tracing

2020-12-03 Thread Peter Zijlstra
On Thu, Dec 03, 2020 at 03:54:42PM +0100, Heiko Carstens wrote: > On Thu, Dec 03, 2020 at 08:28:21AM -0500, Sasha Levin wrote: > > From: Peter Zijlstra > > > > [ Upstream commit 58c644ba512cfbc2e39b758dd979edd1d6d00e27 ] > > > > We call arch_cpu_idle(

Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting

2020-12-03 Thread Peter Zijlstra
On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote: > power: same as ARM, except that the loop may be rather larger since > the systems are bigger. But I imagine it's still faster than Nick's > approach -- a cmpxchg to a remote cacheline should still be faster than > an IPI shootdown

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-02 Thread Peter Zijlstra
On Wed, Dec 02, 2020 at 06:38:12AM -0800, Andy Lutomirski wrote: > > > On Dec 2, 2020, at 6:20 AM, Peter Zijlstra wrote: > > > > On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote: > >> + * - A delayed freeing and RCU-like quiescing sequence

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-02 Thread Peter Zijlstra
On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote: > + * - A delayed freeing and RCU-like quiescing sequence based on > + * mm switching to avoid IPIs completely. That one's interesting too. so basically you want to count switch_mm() invocations on each CP

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-02 Thread Peter Zijlstra
On Wed, Dec 02, 2020 at 12:17:31PM +0100, Peter Zijlstra wrote: > So the obvious 'improvement' here would be something like: > > for_each_online_cpu(cpu) { > p = rcu_dereference(cpu_rq(cpu)->curr; > if (p->active_mm != mm) &

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-12-02 Thread Peter Zijlstra
On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote: > +static void shoot_lazy_tlbs(struct mm_struct *mm) > +{ > + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) { > + /* > + * IPI overheads have not found to be expensive, but they could > + * b

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-11-30 Thread Peter Zijlstra
On Mon, Nov 30, 2020 at 10:30:00AM +0100, Peter Zijlstra wrote: > On Sat, Nov 28, 2020 at 07:54:57PM -0800, Andy Lutomirski wrote: > > This means that mm_cpumask operations won't need to be full barriers > > forever, and we might not want to take the implied full barriers

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-11-30 Thread Peter Zijlstra
On Sat, Nov 28, 2020 at 07:54:57PM -0800, Andy Lutomirski wrote: > This means that mm_cpumask operations won't need to be full barriers > forever, and we might not want to take the implied full barriers in > set_bit() and clear_bit() for granted. There is no implied full barrier for those ops.

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-11-30 Thread Peter Zijlstra
On Sat, Nov 28, 2020 at 07:54:57PM -0800, Andy Lutomirski wrote: > Version (b) seems fairly straightforward to implement -- add RCU > protection and a atomic_t special_ref_cleared (initially 0) to struct > mm_struct itself. After anyone clears a bit to mm_cpumask (which is > already a barrier), N

Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-11-30 Thread Peter Zijlstra
On Sun, Nov 29, 2020 at 12:16:26PM -0800, Andy Lutomirski wrote: > On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski wrote: > > > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin wrote: > > > > > > On big systems, the mm refcount can become highly contented when doing > > > a lot of context switch

Re: [PATCH v2 3/6] perf/core: Fix arch_perf_get_page_size()

2020-11-26 Thread Peter Zijlstra
On Thu, Nov 26, 2020 at 12:56:06PM +, Matthew Wilcox wrote: > On Thu, Nov 26, 2020 at 01:42:07PM +0100, Peter Zijlstra wrote: > > + pgdp = pgd_offset(mm, addr); > > + pgd = READ_ONCE(*pgdp); > > I forget how x86-32-PAE maps to Linux's PGD/P4D/PUD/PMD scheme, b

Re: [PATCH v2 1/6] mm/gup: Provide gup_get_pte() more generic

2020-11-26 Thread Peter Zijlstra
On Thu, Nov 26, 2020 at 12:43:00PM +, Matthew Wilcox wrote: > On Thu, Nov 26, 2020 at 01:01:15PM +0100, Peter Zijlstra wrote: > > +#ifdef CONFIG_GUP_GET_PTE_LOW_HIGH > > +/* > > + * WARNING: only to be used in the get_user_pages_fast() implementation. > > + * Wit

Re: [PATCH v2 4/6] arm64/mm: Implement pXX_leaf_size() support

2020-11-26 Thread Peter Zijlstra
Now with pmd_cont() defined... --- Subject: arm64/mm: Implement pXX_leaf_size() support From: Peter Zijlstra Date: Fri Nov 13 11:46:06 CET 2020 ARM64 has non-pagetable aligned large page support with PTE_CONT, when this bit is set the page is part of a super-page. Match the hugetlb code and

Re: [PATCH v2 3/6] perf/core: Fix arch_perf_get_page_size()

2020-11-26 Thread Peter Zijlstra
On Thu, Nov 26, 2020 at 12:34:58PM +, Matthew Wilcox wrote: > On Thu, Nov 26, 2020 at 01:01:17PM +0100, Peter Zijlstra wrote: > > The (new) page-table walker in arch_perf_get_page_size() is broken in > > various ways. Specifically while it is used in a lockless manner, it > &

[PATCH v2 1/6] mm/gup: Provide gup_get_pte() more generic

2020-11-26 Thread Peter Zijlstra
In order to write another lockless page-table walker, we need gup_get_pte() exposed. While doing that, rename it to ptep_get_lockless() to match the existing ptep_get() naming. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/pgtable.h | 55

[PATCH v2 0/6] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE

2020-11-26 Thread Peter Zijlstra
Hi, These patches provide generic infrastructure to determine TLB page size from page table entries alone. Perf will use this (for either data or code address) to aid in profiling TLB issues. While most architectures only have page table aligned large pages, some (notably ARM64, Sparc64 and Power

[PATCH v2 4/6] arm64/mm: Implement pXX_leaf_size() support

2020-11-26 Thread Peter Zijlstra
-by: Peter Zijlstra (Intel) --- arch/arm64/include/asm/pgtable.h |3 +++ 1 file changed, 3 insertions(+) --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -503,6 +503,9 @@ extern pgprot_t phys_mem_access_prot(str PMD_TYPE_SECT

[PATCH v2 3/6] perf/core: Fix arch_perf_get_page_size()

2020-11-26 Thread Peter Zijlstra
ot;perf,mm: Handle non-page-table-aligned hugetlbfs") Fixes: 8d97e71811aa ("perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE") Signed-off-by: Peter Zijlstra (Intel) Tested-by: Kan Liang --- arch/arm64/include/asm/pgtable.h|3 + arch/sparc/include/asm/pgtable_64.h | 13 ar

[PATCH v2 5/6] sparc64/mm: Implement pXX_leaf_size() support

2020-11-26 Thread Peter Zijlstra
Sparc64 has non-pagetable aligned large page support; wire up the pXX_leaf_size() functions to report the correct pagetable page size. This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate pagetable leaf sizes. Signed-off-by: Peter Zijlstra (Intel) --- arch/sparc/include/asm

[PATCH v2 6/6] powerpc/8xx: Implement pXX_leaf_size() support

2020-11-26 Thread Peter Zijlstra
8M entry. > > In the PTE, we have two bits: _PAGE_SPS and _PAGE_HUGE > > _PAGE_HUGE means it is a 512k page > _PAGE_SPS means it is not a 4k page > > The kernel can by build either with 4k pages as standard page size, or > 16k pages. It doesn't change the page table l

[PATCH v2 2/6] mm: Introduce pXX_leaf_size()

2020-11-26 Thread Peter Zijlstra
A number of architectures have non-pagetable aligned huge/large pages. For such architectures a leaf can actually be part of a larger entry. Provide generic helpers to determine the size of a page-table leaf. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/pgtable.h | 16

Re: [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE

2020-11-26 Thread Peter Zijlstra
On Fri, Nov 20, 2020 at 01:20:04PM +0100, Peter Zijlstra wrote: > > > I can help with powerpc 8xx. It is a 32 bits powerpc. The PGD has 1024 > > > entries, that means each entry maps 4M. > > > > > > Page sizes are 4k, 16k, 512k and 8M. > > > >

Re: [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE

2020-11-20 Thread Peter Zijlstra
On Fri, Nov 20, 2020 at 12:18:22PM +0100, Christophe Leroy wrote: > Hi Peter, > > Le 13/11/2020 à 14:44, Christophe Leroy a écrit : > > Hi > > > > Le 13/11/2020 à 12:19, Peter Zijlstra a écrit : > > > Hi, > > > > > > These patches pro

Re: [PATCH 1/2] kernel/cpu: add arch override for clear_tasks_mm_cpumask() mm handling

2020-11-20 Thread Peter Zijlstra
r. > > Add an arch override allowing powerpc to use clear_tasks_mm_cpumask(). > > Signed-off-by: Nicholas Piggin Seems reasonable enough.. Acked-by: Peter Zijlstra (Intel) > --- > kernel/cpu.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > di

Re: [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE

2020-11-16 Thread Peter Zijlstra
On Mon, Nov 16, 2020 at 08:36:36AM -0800, Dave Hansen wrote: > On 11/16/20 8:32 AM, Matthew Wilcox wrote: > >> > >> That's really the best we can do from software without digging into > >> microarchitecture-specific events. > > I mean this is perf. Digging into microarch specific events is what it

Re: [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE

2020-11-16 Thread Peter Zijlstra
On Mon, Nov 16, 2020 at 08:28:23AM -0800, Dave Hansen wrote: > On 11/16/20 7:54 AM, Matthew Wilcox wrote: > > It gets even more complicated with CPUs with multiple levels of TLB > > which support different TLB entry sizes. My CPU reports: > > > > TLB info > > Instruction TLB: 2M/4M pages, fully

Re: [PATCH 2/5] mm: Introduce pXX_leaf_size()

2020-11-13 Thread Peter Zijlstra
On Fri, Nov 13, 2020 at 12:19:03PM +0100, Peter Zijlstra wrote: > A number of architectures have non-pagetable aligned huge/large pages. > For such architectures a leaf can actually be part of a larger TLB > entry. > > Provide generic helpers to determine the TLB size of a pa

[PATCH 5/5] sparc64/mm: Implement pXX_leaf_size() support

2020-11-13 Thread Peter Zijlstra
Sparc64 has non-pagetable aligned large page support; wire up the pXX_leaf_size() functions to report the correct TLB page size. This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate TLB page sizes. Signed-off-by: Peter Zijlstra (Intel) --- arch/sparc/include/asm/pgtable_64.h

[PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE

2020-11-13 Thread Peter Zijlstra
Hi, These patches provide generic infrastructure to determine TLB page size from page table entries alone. Perf will use this (for either data or code address) to aid in profiling TLB issues. While most architectures only have page table aligned large pages, some (notably ARM64, Sparc64 and Power

[PATCH 3/5] perf/core: Fix arch_perf_get_page_size()

2020-11-13 Thread Peter Zijlstra
Handle non-page-table-aligned hugetlbfs") Fixes: 8d97e71811aa ("perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE") Signed-off-by: Peter Zijlstra (Intel) --- arch/arm64/include/asm/pgtable.h|3 + arch/sparc/include/asm/pgtable_64.h | 13 arch/sparc/mm/hugetlbpage.c |

[PATCH 1/5] mm/gup: Provide gup_get_pte() more generic

2020-11-13 Thread Peter Zijlstra
In order to write another lockless page-table walker, we need gup_get_pte() exposed. While doing that, rename it to ptep_get_lockless() to match the existing ptep_get() naming. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/pgtable.h | 55

[PATCH 4/5] arm64/mm: Implement pXX_leaf_size() support

2020-11-13 Thread Peter Zijlstra
: Peter Zijlstra (Intel) --- arch/arm64/include/asm/pgtable.h |3 +++ 1 file changed, 3 insertions(+) --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -503,6 +503,9 @@ extern pgprot_t phys_mem_access_prot(str PMD_TYPE_SECT

[PATCH 2/5] mm: Introduce pXX_leaf_size()

2020-11-13 Thread Peter Zijlstra
A number of architectures have non-pagetable aligned huge/large pages. For such architectures a leaf can actually be part of a larger TLB entry. Provide generic helpers to determine the TLB size of a page-table leaf. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/pgtable.h | 16

Re: [PATCH 1/3] asm-generic/atomic64: Add support for ARCH_ATOMIC

2020-11-11 Thread Peter Zijlstra
On Wed, Nov 11, 2020 at 02:39:01PM +0100, Christophe Leroy wrote: > Hello, > > Le 11/11/2020 à 12:07, Nicholas Piggin a écrit : > > This passes atomic64 selftest on ppc32 on qemu (uniprocessor only) > > both before and after powerpc is converted to use ARCH_ATOMIC. > > Can you explain what this c

Re: [PATCH 3/8] powerpc: Mark functions called inside uaccess blocks w/ 'notrace'

2020-10-16 Thread Peter Zijlstra
On Fri, Oct 16, 2020 at 07:56:16AM +0100, Christoph Hellwig wrote: > On Thu, Oct 15, 2020 at 10:01:54AM -0500, Christopher M. Riedl wrote: > > Functions called between user_*_access_begin() and user_*_access_end() > > should be either inlined or marked 'notrace' to prevent leaving > > userspace acc

Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-24 Thread Peter Zijlstra
On Thu, Sep 24, 2020 at 09:51:38AM -0400, Steven Rostedt wrote: > > It turns out, that getting selected for pull-balance is exactly that > > condition, and clearly a migrate_disable() task cannot be pulled, but we > > can use that signal to try and pull away the running task that's in the > > way.

Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-24 Thread Peter Zijlstra
On Thu, Sep 24, 2020 at 08:32:41AM -0400, Steven Rostedt wrote: > Anyway, instead of blocking. What about having a counter of number of > migrate disabled tasks per cpu, and when taking a migrate_disable(), and > there's > already another task with migrate_disabled() set, and the current task has

Re: [PATCH 1/2] lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state

2020-07-25 Thread Peter Zijlstra
On Thu, Jul 23, 2020 at 08:56:14PM +1000, Nicholas Piggin wrote: > diff --git a/arch/powerpc/include/asm/hw_irq.h > b/arch/powerpc/include/asm/hw_irq.h > index 3a0db7b0b46e..35060be09073 100644 > --- a/arch/powerpc/include/asm/hw_irq.h > +++ b/arch/powerpc/include/asm/hw_irq.h > @@ -200,17 +200,14

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-25 Thread Peter Zijlstra
On Fri, Jul 24, 2020 at 03:10:59PM -0400, Waiman Long wrote: > On 7/24/20 4:16 AM, Will Deacon wrote: > > On Thu, Jul 23, 2020 at 08:47:59PM +0200, pet...@infradead.org wrote: > > > On Thu, Jul 23, 2020 at 02:32:36PM -0400, Waiman Long wrote: > > > > BTW, do you have any comment on my v2 lock holde

Re: [PATCH 1/2] lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state

2020-07-23 Thread Peter Zijlstra
On Thu, Jul 23, 2020 at 11:11:03PM +1000, Nicholas Piggin wrote: > Excerpts from Peter Zijlstra's message of July 23, 2020 9:40 pm: > > On Thu, Jul 23, 2020 at 08:56:14PM +1000, Nicholas Piggin wrote: > > > >> diff --git a/arch/powerpc/include/asm/hw_irq.h > >> b/arch/powerpc/include/asm/hw_irq.h

Re: [PATCH v3 5/6] powerpc/pseries: implement paravirt qspinlocks for SPLPAR

2020-07-23 Thread Peter Zijlstra
On Thu, Jul 09, 2020 at 12:06:13PM -0400, Waiman Long wrote: > We don't really need to do a pv_spinlocks_init() if pv_kick() isn't > supported. Waiman, if you cannot explain how not having kick is a sane thing, what are you saying here?

Re: [PATCH 1/2] lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state

2020-07-23 Thread Peter Zijlstra
On Thu, Jul 23, 2020 at 08:56:14PM +1000, Nicholas Piggin wrote: > diff --git a/arch/powerpc/include/asm/hw_irq.h > b/arch/powerpc/include/asm/hw_irq.h > index 3a0db7b0b46e..35060be09073 100644 > --- a/arch/powerpc/include/asm/hw_irq.h > +++ b/arch/powerpc/include/asm/hw_irq.h > @@ -200,17 +200,1

Re: [PATCH v2 06/10] powerpc/smp: Generalize 2nd sched domain

2020-07-22 Thread Peter Zijlstra
On Wed, Jul 22, 2020 at 01:48:22PM +0530, Srikar Dronamraju wrote: > * pet...@infradead.org [2020-07-22 09:46:24]: > > > On Tue, Jul 21, 2020 at 05:08:10PM +0530, Srikar Dronamraju wrote: > > > Currently "CACHE" domain happens to be the 2nd sched domain as per > > > powerpc_topology. This domain

Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-07-21 Thread Peter Zijlstra
On Tue, Jul 21, 2020 at 11:15:13AM -0400, Mathieu Desnoyers wrote: > - On Jul 21, 2020, at 11:06 AM, Peter Zijlstra pet...@infradead.org wrote: > > > On Tue, Jul 21, 2020 at 08:04:27PM +1000, Nicholas Piggin wrote: > > > >> That being said, the x86 sync core g

Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-07-16 Thread Peter Zijlstra
On Wed, Jul 15, 2020 at 10:18:20PM -0700, Andy Lutomirski wrote: > > On Jul 15, 2020, at 9:15 PM, Nicholas Piggin wrote: > > CPU0 CPU1 > > 1. user stuff > > a. membarrier() 2. enter kernel > > b. read rq->curr 3. rq->curr switched to kt

Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper

2020-07-14 Thread Peter Zijlstra
On Tue, Jul 14, 2020 at 07:31:03PM +0300, Jarkko Sakkinen wrote: > On Tue, Jul 14, 2020 at 03:01:09PM +0200, Peter Zijlstra wrote: > > to help with text_alloc() usage in generic code, but I think > > fundamentally, there's only these two options. > > There is one

Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper

2020-07-14 Thread Peter Zijlstra
On Tue, Jul 14, 2020 at 03:19:24PM +0300, Ard Biesheuvel wrote: > So perhaps the answer is to have text_alloc() not with a 'where' > argument but with a 'why' argument. Or more simply, just have separate > alloc/free APIs for each case, with generic versions that can be > overridden by the architec

Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option

2020-07-14 Thread Peter Zijlstra
On Tue, Jul 14, 2020 at 05:46:05AM -0700, Andy Lutomirski wrote: > x86 has this exact problem. At least no more than 64*8 CPUs share the cache > line :) I've seen patches for a 'sparse' bitmap to solve related problems. It's basically the same code, except it multiplies everything (size, bit-nr)

Re: [PATCH 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper

2020-07-14 Thread Peter Zijlstra
On Tue, Jul 14, 2020 at 11:33:33AM +0100, Russell King - ARM Linux admin wrote: > For 32-bit ARM, our bpf code uses "blx/bx" (or equivalent code > sequences) rather than encoding a "bl" or "b", so BPF there doesn't > care where the executable memory is mapped, and doesn't need any > PLTs. Given th

Re: [PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper

2020-07-14 Thread Peter Zijlstra
On Tue, Jul 14, 2020 at 11:28:27AM +0100, Will Deacon wrote: > As Ard says, module_alloc() _is_ special, in the sense that the virtual > memory it allocates wants to be close to the kernel text, whereas the > concept of allocating executable memory is broader and doesn't have these > restrictions.

Re: [RFC PATCH 5/7] lazy tlb: introduce lazy mm refcount helper functions

2020-07-10 Thread Peter Zijlstra
On Fri, Jul 10, 2020 at 11:56:44AM +1000, Nicholas Piggin wrote: > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c > index 73199470c265..ad95812d2a3f 100644 > --- a/arch/powerpc/kernel/smp.c > +++ b/arch/powerpc/kernel/smp.c > @@ -1253,7 +1253,7 @@ void start_secondary(void *unu

Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode

2020-07-10 Thread Peter Zijlstra
On Fri, Jul 10, 2020 at 11:56:43AM +1000, Nicholas Piggin wrote: > And get rid of the generic sync_core_before_usermode facility. > > This helper is the wrong way around I think. The idea that membarrier > state requires a core sync before returning to user is the easy one > that does not need hid

<    1   2   3   4   5   6   7   8   9   10   >