[PATCH 10/12] tracing: Convert the per CPU "disabled" counter to local from atomic

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The per CPU "disabled" counter is used for the latency tracers and stack tracers to make sure that their accounting isn't messed up by an NMI or interrupt coming in and affecting the same CPU data. But the counter is an atomic_t type. As it only needs to synchronize against t

[PATCH 09/12] tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The branch tracer currently checks the per CPU "disabled" field to know if tracing is enabled or not for the CPU. As the "disabled" value is not used anymore to turn of tracing generically, use tracing_tracer_is_on_cpu() instead. Signed-off-by: Steven Rostedt (Google) ---

[PATCH 12/12] tracing: Remove unused buffer_page field from trace_array_cpu structure

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The trace_array_cpu had a "buffer_page" field that was originally going to be used as a backup page for the ring buffer. But the ring buffer has its own way of reusing pages and this field was never used. Remove it. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/

[PATCH 08/12] ring-buffer: Add ring_buffer_record_is_on_cpu()

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt Add the function ring_buffer_record_is_on_cpu() that returns true if the ring buffer for a give CPU is writable and false otherwise. Also add tracer_tracing_is_on_cpu() to return if the ring buffer for a given CPU is writeable for a given trace_array. Signed-off-by: Steven

[PATCH 11/12] tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The irqsoff tracer uses the per CPU "disabled" field to prevent corruption of the accounting when it starts to trace interrupts disabled, but there's a slight race that could happen if for some reason it was called twice. Use atomic_inc_return() instead. Signed-off-by: Steve

[PATCH 07/12] tracing: Do not use per CPU array_buffer.data->disabled for cpumask

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The per CPU "disabled" value was the original way to disable tracing when the tracing subsystem was first created. Today, the ring buffer infrastructure has its own way to disable tracing. In fact, things have changed so much since 2008 that many things ignore the disable fla

[PATCH 06/12] ftrace: Do not disabled function graph based on "disabled" field

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The per CPU "disabled" value was the original way to disable tracing when the tracing subsystem was first created. Today, the ring buffer infrastructure has its own way to disable tracing. In fact, things have changed so much since 2008 that many things ignore the disable fla

[PATCH 03/12] ftrace: Do not bother checking per CPU "disabled" flag

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The per CPU "disabled" value was the original way to disable tracing when the tracing subsystem was first created. Today, the ring buffer infrastructure has its own way to disable tracing. In fact, things have changed so much since 2008 that many things ignore the disable fla

[PATCH 02/12] tracing: Do not bother setting "disabled" field for ftrace_dump_one()

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The per CPU "disabled" value was the original way to disable tracing when the tracing subsystem was first created. Today, the ring buffer infrastructure has its own way to disable tracing. In fact, things have changed so much since 2008 that many things ignore the disable fla

[PATCH 05/12] tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The per CPU "disabled" value was the original way to disable tracing when the tracing subsystem was first created. Today, the ring buffer infrastructure has its own way to disable tracing. In fact, things have changed so much since 2008 that many things ignore the disable fla

[PATCH 04/12] tracing: Just use this_cpu_read() to access ignore_pid

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The ignore_pid boolean on the per CPU data descriptor is updated at sched_switch when a new task is scheduled in. If the new task is to be ignored, it is set to true, otherwise it is set to false. The current task should always have the correct value as it is updated when the

[PATCH 00/12] tracing: Remove most uses of "disabled" field

2025-05-02 Thread Steven Rostedt
Looking into allowing syscall events to fault and read user space, I found that the use of the per CPU data "disabled" field was mostly obsolete. This goes back to 2008 when the tracing subsystem was first created. The "disabled" field was the only way to know if tracing was disabled or not. But

[PATCH 01/12] tracing/mmiotrace: Remove reference to unused per CPU data pointer

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt The mmiotracer referenced the per CPU array_buffer->data descriptor but never actually used it. Remove the references to it. Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_mmiotrace.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff

Re: [PATCH v6 5/5] perf: Support deferred user callchains for per CPU events

2025-05-02 Thread Namhyung Kim
On Thu, May 01, 2025 at 04:57:30PM -0400, Steven Rostedt wrote: > On Thu, 1 May 2025 13:14:11 -0700 > Namhyung Kim wrote: > > > Hi Steve, > > > > On Wed, Apr 30, 2025 at 09:32:07PM -0400, Steven Rostedt wrote: > > > > To solve this, when a per CPU event is created that has defer_callchain > > >

Re: [PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition

2025-05-02 Thread Dave Hansen
gah, the cc list here is rotund... On 5/2/25 09:38, Valentin Schneider wrote: ... >> All of the paths to enter the kernel from userspace have some >> SWITCH_TO_KERNEL_CR3 variant. If they didn't, the userspace that they >> entered from could have attacked the kernel with Meltdown. >> >> I'm theori

[PATCH v7 10/17] unwind_user/deferred: Make unwind deferral requests NMI-safe

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf Make unwind_deferred_request() NMI-safe so tracers in NMI context can call it to get the cookie immediately rather than have to do the fragile "schedule irq work and then call unwind_deferred_request()" dance. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Goo

[PATCH v7 17/17] perf: Skip user unwind if the task is a kernel thread.

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf If the task is not a user thread, there's no user stack to unwind. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- kernel/events/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core

[PATCH v7 15/17] perf: Use current->flags & PF_KTHREAD instead of current->mm == NULL

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt To determine if a task is a kernel thread or not, it is more reliable to use (current->flags & PF_KTHREAD) than to rely on current->mm being NULL. That is because some kernel tasks (io_uring helpers) may have a mm field. Link: https://lore.kernel.org/linux-trace-kernel/2025

[PATCH v7 16/17] perf: Simplify get_perf_callchain() user logic

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf Simplify the get_perf_callchain() user logic a bit. task_pt_regs() should never be NULL. Acked-by: Namhyung Kim Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- kernel/events/callchain.c | 18 -- 1 file changed, 8 insertions(+), 1

[PATCH v7 14/17] perf: Have get_perf_callchain() return NULL if crosstask and user are set

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf get_perf_callchain() doesn't support cross-task unwinding for user space stacks, have it return NULL if both the crosstask and user arguments are set. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- kernel/events/callchain.c | 8 1 file c

[PATCH v7 13/17] perf: Remove get_perf_callchain() init_nr argument

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf The 'init_nr' argument has double duty: it's used to initialize both the number of contexts and the number of stack entries. That's confusing and the callers always pass zero anyway. Hard code the zero. Acked-by: Namhyung Kim Signed-off-by: Josh Poimboeuf Signed-off-by:

[PATCH v7 12/17] unwind deferred: Use SRCU unwind_deferred_task_work()

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt Instead of using the callback_mutex to protect the link list of callbacks in unwind_deferred_task_work(), use SRCU instead. This gets called every time a task exits that has to record a stack trace that was requested. This can happen for many tasks on several CPUs at the same

[PATCH v7 11/17] unwind deferred: Use bitmask to determine which callbacks to call

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt In order to know which registered callback requested a stacktrace for when the task goes back to user space, add a bitmask for all registered tracers. The bitmask is the size of log, which means that on a 32 bit machine, it can have at most 32 registered tracers, and on 64 bi

[PATCH v7 08/17] unwind_user/deferred: Add unwind cache

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf Cache the results of the unwind to ensure the unwind is only performed once, even when called by multiple tracers. The cache nr_entries gets cleared every time the task exits the kernel. When a stacktrace is requested, nr_entries gets set to the number of entries in the stac

[PATCH v7 09/17] unwind_user/deferred: Add deferred unwinding interface

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf Add an interface for scheduling task work to unwind the user space stack before returning to user space. This solves several problems for its callers: - Ensure the unwind happens in task context even if the caller may be running in NMI or interrupt context. - Avoid

[PATCH v7 07/17] unwind_user/deferred: Add unwind_deferred_trace()

2025-05-02 Thread Steven Rostedt
From: Steven Rostedt Add a function that must be called inside a faultable context that will retrieve a user space stack trace. The function unwind_deferred_trace() can be called by a tracer when a task is about to enter user space, or has just come back from user space and has interrupts enabled

[PATCH v7 04/17] perf/x86: Rename and move get_segment_base() and make it global

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf get_segment_base() will be used by the unwind_user code, so make it global and rename it so it doesn't conflict with a KVM function of the same name. As the function is no longer specific to perf, move it to ptrace.c as that seems to be a better location for a generic functi

[PATCH v7 06/17] unwind_user/x86: Enable compat mode frame pointer unwinding on x86

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf Use ARCH_INIT_USER_COMPAT_FP_FRAME to describe how frame pointers are unwound on x86, and implement the hooks needed to add the segment base addresses. Enable HAVE_UNWIND_USER_COMPAT_FP if the system has compat mode compiled in. Signed-off-by: Josh Poimboeuf Signed-off-by:

[PATCH v7 05/17] unwind_user: Add compat mode frame pointer support

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf Add optional support for user space compat mode frame pointer unwinding. If supported, the arch needs to enable CONFIG_HAVE_UNWIND_USER_COMPAT_FP and define ARCH_INIT_USER_COMPAT_FP_FRAME. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- arch/Kconf

[PATCH v7 02/17] unwind_user: Add frame pointer support

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf Add optional support for user space frame pointer unwinding. If supported, the arch needs to enable CONFIG_HAVE_UNWIND_USER_FP and define ARCH_INIT_USER_FP_FRAME. By encoding the frame offsets in struct unwind_user_frame, much of this code can also be reused for future unwi

[PATCH v7 03/17] unwind_user/x86: Enable frame pointer unwinding on x86

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf Use ARCH_INIT_USER_FP_FRAME to describe how frame pointers are unwound on x86, and enable CONFIG_HAVE_UNWIND_USER_FP accordingly so the unwind_user interfaces can be used. Signed-off-by: Josh Poimboeuf Signed-off-by: Steven Rostedt (Google) --- arch/x86/Kconfig

[PATCH v7 01/17] unwind_user: Add user space unwinding API

2025-05-02 Thread Steven Rostedt
From: Josh Poimboeuf Introduce a generic API for unwinding user stacks. In order to expand user space unwinding to be able to handle more complex scenarios, such as deferred unwinding and reading user space information, create a generic interface that all architectures can use that support the v

[PATCH v7 00/17] unwind_user: perf: x86: Deferred unwinding infrastructure

2025-05-02 Thread Steven Rostedt
[ Shorten the Cc list to just those that maintain this ] This series does not make any user space visible changes. It only adds the necessary infrastructure of the deferred unwinder and makes a few helpful cleanups to perf. Based off of tip/master: 252d33c92dbc23bcc1e662a889787c09a02eeccc Pet

Re: [PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition

2025-05-02 Thread Valentin Schneider
On 02/05/25 06:53, Dave Hansen wrote: > On 5/2/25 02:55, Valentin Schneider wrote: >> My gripe with that was having two separate mechanisms >> - super early entry around SWITCH_TO_KERNEL_CR3) >> - later entry at context tracking > > What do you mean by "later entry"? > I meant the point at which t

Re: [PATCH v5 07/12] khugepaged: add mTHP support

2025-05-02 Thread David Hildenbrand
On 02.05.25 17:30, Nico Pache wrote: On Fri, May 2, 2025 at 9:27 AM Jann Horn wrote: On Fri, May 2, 2025 at 5:19 PM David Hildenbrand wrote: On 02.05.25 14:50, Jann Horn wrote: On Fri, May 2, 2025 at 8:29 AM David Hildenbrand wrote: On 02.05.25 00:29, Nico Pache wrote: On Wed, Apr 30, 2

Re: [PATCH v5 07/12] khugepaged: add mTHP support

2025-05-02 Thread David Hildenbrand
On 02.05.25 17:24, Lorenzo Stoakes wrote: On Fri, May 02, 2025 at 05:18:54PM +0200, David Hildenbrand wrote: On 02.05.25 14:50, Jann Horn wrote: On Fri, May 2, 2025 at 8:29 AM David Hildenbrand wrote: On 02.05.25 00:29, Nico Pache wrote: On Wed, Apr 30, 2025 at 2:53 PM Jann Horn wrote: On

Re: [PATCH v5 07/12] khugepaged: add mTHP support

2025-05-02 Thread Nico Pache
On Fri, May 2, 2025 at 9:27 AM Jann Horn wrote: > > On Fri, May 2, 2025 at 5:19 PM David Hildenbrand wrote: > > > > On 02.05.25 14:50, Jann Horn wrote: > > > On Fri, May 2, 2025 at 8:29 AM David Hildenbrand wrote: > > >> On 02.05.25 00:29, Nico Pache wrote: > > >>> On Wed, Apr 30, 2025 at 2:53 P

Re: [PATCH v5 07/12] khugepaged: add mTHP support

2025-05-02 Thread Jann Horn
On Fri, May 2, 2025 at 5:19 PM David Hildenbrand wrote: > > On 02.05.25 14:50, Jann Horn wrote: > > On Fri, May 2, 2025 at 8:29 AM David Hildenbrand wrote: > >> On 02.05.25 00:29, Nico Pache wrote: > >>> On Wed, Apr 30, 2025 at 2:53 PM Jann Horn wrote: > > On Mon, Apr 28, 2025 at 8:12 

Re: [PATCH v5 07/12] khugepaged: add mTHP support

2025-05-02 Thread Lorenzo Stoakes
On Fri, May 02, 2025 at 05:18:54PM +0200, David Hildenbrand wrote: > On 02.05.25 14:50, Jann Horn wrote: > > On Fri, May 2, 2025 at 8:29 AM David Hildenbrand wrote: > > > On 02.05.25 00:29, Nico Pache wrote: > > > > On Wed, Apr 30, 2025 at 2:53 PM Jann Horn wrote: > > > > > > > > > > On Mon, Apr

Re: [PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition

2025-05-02 Thread Peter Zijlstra
On Fri, May 02, 2025 at 07:33:55AM -0700, Dave Hansen wrote: > On 5/2/25 04:22, Peter Zijlstra wrote: > > On Wed, Apr 30, 2025 at 11:07:35AM -0700, Dave Hansen wrote: > > > >> Both AMD and Intel have hardware to do it. ARM CPUs do it too, I think. > >> You can go buy the Intel hardware off the she

Re: [PATCH v5 07/12] khugepaged: add mTHP support

2025-05-02 Thread David Hildenbrand
On 02.05.25 14:50, Jann Horn wrote: On Fri, May 2, 2025 at 8:29 AM David Hildenbrand wrote: On 02.05.25 00:29, Nico Pache wrote: On Wed, Apr 30, 2025 at 2:53 PM Jann Horn wrote: On Mon, Apr 28, 2025 at 8:12 PM Nico Pache wrote: Introduce the ability for khugepaged to collapse to different

Re: [PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition

2025-05-02 Thread Dave Hansen
On 5/2/25 04:22, Peter Zijlstra wrote: > On Wed, Apr 30, 2025 at 11:07:35AM -0700, Dave Hansen wrote: > >> Both AMD and Intel have hardware to do it. ARM CPUs do it too, I think. >> You can go buy the Intel hardware off the shelf today. > To be fair, the Intel RAR thing is pretty horrific 🙁 Defini

Re: [PATCH bpf-next 1/4] bpf: Allow get_func_[arg|arg_cnt] helpers in raw tracepoint programs

2025-05-02 Thread Leon Hwang
On 2025/5/1 00:53, Alexei Starovoitov wrote: > On Wed, Apr 30, 2025 at 8:55 AM Leon Hwang wrote: >> >> >> >> On 2025/4/30 20:43, Kafai Wan wrote: >>> On Wed, Apr 30, 2025 at 10:46 AM Alexei Starovoitov >>> wrote: On Sat, Apr 26, 2025 at 9:00 AM KaFai Wan wrote: > >> [...] >> >

Re: [PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition

2025-05-02 Thread Dave Hansen
On 5/2/25 02:55, Valentin Schneider wrote: > My gripe with that was having two separate mechanisms > - super early entry around SWITCH_TO_KERNEL_CR3) > - later entry at context tracking What do you mean by "later entry"? All of the paths to enter the kernel from userspace have some SWITCH_TO_KERN

Re: [PATCH v2 2/2] tracing: protect trace_probe_log with mutex

2025-05-02 Thread Steven Rostedt
On Fri, 02 May 2025 15:15:53 +0200 Paul Cacheux via B4 Relay wrote: > From: Paul Cacheux > > The shared trace_probe_log variable can be accessed and modified > by multiple processes using tracefs at the same time, this new > mutex will guarantee it's always in a coherent state. > > There is no

[PATCH v2 0/2] tracing: fix race when creating trace probe log error message

2025-05-02 Thread Paul Cacheux via B4 Relay
Hello, As reported in [1] a race exists in the shared trace probe log used to build error messages. This can cause kernel crashes when building the actual error message, but the race happens even for non-error tracefs uses, it's just not visible. Reproducer first reported that is still crashing:

[PATCH v2 2/2] tracing: protect trace_probe_log with mutex

2025-05-02 Thread Paul Cacheux via B4 Relay
From: Paul Cacheux The shared trace_probe_log variable can be accessed and modified by multiple processes using tracefs at the same time, this new mutex will guarantee it's always in a coherent state. There is no guarantee that multiple errors happening at the same time will each have the correc

[PATCH v2 1/2] tracing: add missing trace_probe_log_clear for eprobes

2025-05-02 Thread Paul Cacheux via B4 Relay
From: Paul Cacheux Make sure trace_probe_log_clear is called in the tracing eprobe code path, matching the trace_probe_log_init call. Signed-off-by: Paul Cacheux --- kernel/trace/trace_eprobe.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/tr

Re: [PATCH v5 07/12] khugepaged: add mTHP support

2025-05-02 Thread Jann Horn
On Fri, May 2, 2025 at 8:29 AM David Hildenbrand wrote: > On 02.05.25 00:29, Nico Pache wrote: > > On Wed, Apr 30, 2025 at 2:53 PM Jann Horn wrote: > >> > >> On Mon, Apr 28, 2025 at 8:12 PM Nico Pache wrote: > >>> Introduce the ability for khugepaged to collapse to different mTHP sizes. > >>> Wh

Re: [PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition

2025-05-02 Thread Peter Zijlstra
On Wed, Apr 30, 2025 at 11:07:35AM -0700, Dave Hansen wrote: > Both AMD and Intel have hardware to do it. ARM CPUs do it too, I think. > You can go buy the Intel hardware off the shelf today. To be fair, the Intel RAR thing is pretty horrific :-( Definitely sub-par compared to the AMD and ARM thi

Re: [PATCH v5 00/25] context_tracking,x86: Defer some IPIs until a user->kernel transition

2025-05-02 Thread Valentin Schneider
On 30/04/25 13:00, Dave Hansen wrote: > On 4/30/25 12:42, Steven Rostedt wrote: >>> Look at the syscall code for instance: >>> SYM_CODE_START(entry_SYSCALL_64) swapgs movq%rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2) SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp >>

Re: [PATCH 22/22] man2: Add uprobe syscall page

2025-05-02 Thread Jiri Olsa
On Thu, May 01, 2025 at 11:26:46PM +0200, Alejandro Colomar wrote: > Hi Jiri, > > On Tue, Apr 22, 2025 at 10:45:41PM +0200, Alejandro Colomar wrote: > > On Tue, Apr 22, 2025 at 04:01:56PM +0200, Jiri Olsa wrote: > > > > > +is an alternative to breakpoint instructions > > > > > +for triggering entr