Re: [PATCH] arm64: uprobes: Simulate STP for pushing fp/lr into user stack

2024-10-28 Thread Liao, Chang
在 2024/10/24 22:06, Mark Rutland 写道: > On Tue, Sep 10, 2024 at 06:04:07AM +0000, Liao Chang wrote: >> This patch is the second part of a series to improve the selftest bench >> of uprobe/uretprobe [0]. The lack of simulating 'stp fp, lr, [sp, #imm]' >> signif

[PATCH v4 0/2] uprobes: Improve scalability by reducing the contention on siglock

2024-10-22 Thread Liao Chang
nstead of new UPROBE_SSTEP state. [1] https://lore.kernel.org/all/20240731214256.3588718-1-and...@kernel.org [2] https://lore.kernel.org/all/20240727094405.1362496-1-liaocha...@huawei.com [3] https://lore.kernel.org/all/20240815014629.2685155-1-liaocha...@huawei.com/ Liao Chang (2): uprobes: Re

[PATCH v4 1/2] uprobes: Remove redundant spinlock in uprobe_deny_signal()

2024-10-22 Thread Liao Chang
Since clearing a bit in thread_info is an atomic operation, the spinlock is redundant and can be removed, reducing lock contention is good for performance. Acked-by: Masami Hiramatsu (Google) Acked-by: Oleg Nesterov Signed-off-by: Liao Chang --- kernel/events/uprobes.c | 2 -- 1 file changed

[PATCH v4 2/2] uprobes: Remove the spinlock within handle_singlestep()

2024-10-22 Thread Liao Chang
rg [2] https://lore.kernel.org/all/20240727094405.1362496-1-liaocha...@huawei.com Acked-by: Masami Hiramatsu (Google) Acked-by: Oleg Nesterov Signed-off-by: Liao Chang --- include/linux/uprobes.h | 1 + kernel/events/uprobes.c | 8 +--- 2 files changed, 6 insertions(+), 3 deletions(-)

Re: [PATCH v3 0/2] uprobes: Improve scalability by reducing the contention on siglock

2024-10-21 Thread Liao, Chang
在 2024/10/22 1:18, Andrii Nakryiko 写道: > On Mon, Oct 21, 2024 at 3:43 AM Liao, Chang wrote: >> >> >> >> 在 2024/10/12 3:34, Andrii Nakryiko 写道: >>> On Tue, Sep 17, 2024 at 7:05 PM Liao, Chang wrote: >>>> >>>> Hi, Peter and Masami >

Re: [PATCH v2] uprobes: Improve the usage of xol slots for better scalability

2024-10-21 Thread Liao, Chang
在 2024/10/1 22:29, Oleg Nesterov 写道: > On 09/27, Liao Chang wrote: >> >> The uprobe handler allocates xol slot from xol_area and quickly release >> it in the single-step handler. The atomic operations on the xol bitmap >> and slot_count lead to expensive cache lin

Re: [PATCH v2] uprobes: Improve the usage of xol slots for better scalability

2024-10-21 Thread Liao, Chang
Oleg, My bad to take so long to reply. I have recently returned from a long vacation. 在 2024/9/28 1:18, Oleg Nesterov 写道: > On 09/27, Liao Chang wrote: >> >> +int recycle_utask_slot(struct uprobe_task *utask, struct xol_area *area) >> +{ >> +

Re: [PATCH v2] arm64: insn: Simulate nop instruction for better uprobe performance

2024-10-21 Thread Liao, Chang
在 2024/10/10 18:52, Mark Rutland 写道: > On Mon, Sep 09, 2024 at 07:11:14AM +0000, Liao Chang wrote: >> v2->v1: >> 1. Remove the simuation of STP and the related bits. >> 2. Use arm64_skip_faulting_instruction for single-stepping or FEAT_BTI >>scenario. >>

Re: [PATCH v2] arm64: insn: Simulate nop instruction for better uprobe performance

2024-10-21 Thread Liao, Chang
在 2024/10/16 2:58, Catalin Marinas 写道: > On Mon, 09 Sep 2024 07:11:14 +0000, Liao Chang wrote: >> v2->v1: >> 1. Remove the simuation of STP and the related bits. >> 2. Use arm64_skip_faulting_instruction for single-stepping or FEAT_BTI >>scenario. >>

Re: [PATCH v3 0/2] uprobes: Improve scalability by reducing the contention on siglock

2024-10-21 Thread Liao, Chang
在 2024/10/12 3:34, Andrii Nakryiko 写道: > On Tue, Sep 17, 2024 at 7:05 PM Liao, Chang wrote: >> >> Hi, Peter and Masami >> >> I look forward to your inputs on these series. Andrii has proven they are >> hepful for uprobe scalability. >> >> Than

[PATCH v2] function_graph: Improve fgraph LRU data initialization

2024-09-30 Thread Liao Chang
ace scenario, then restore the definition without static keyword in the original patch [1]. And rebasing patch to next-20240927. [1] https://lore.kernel.org/all/20240912111550.1752115-1-liaocha...@huawei.com Signed-off-by: Liao Chang --- kernel/trace/fgraph.c |

Re: [PATCH] function_graph: Simplify the initialization of fgraph LRU data

2024-09-30 Thread Liao, Chang
在 2024/9/28 8:17, Steven Rostedt 写道: > On Thu, 12 Sep 2024 11:15:50 + > Liao Chang wrote: > >> This patch uses [first ... last] = value to initialize fgraph_array[]. >> And it declares all the callbacks in fgraph_stub as static, as they are >> not called fr

[PATCH v2] uprobes: Improve the usage of xol slots for better scalability

2024-09-27 Thread Liao Chang
m 1 to 0 when recycling slot from GC list. [1] https://lore.kernel.org/all/ZuwoUmqXrztp-Mzh@tassilo/ Signed-off-by: Liao Chang --- include/linux/uprobes.h | 4 + kernel/events/uprobes.c | 177 ++-- 2 files changed, 139 insertions(+), 42 deletions(-) diff --gi

Re: [PATCH] uprobes: Improve the usage of xol slots for better scalability

2024-09-27 Thread Liao, Chang
dubious. Andi, I've just sent v2. Looking forward to your feedback. Thanks. https://lore.kernel.org/all/20240927094549.3382916-1-liaocha...@huawei.com/ > > -Andi > -- BR Liao, Chang

Re: [PATCH] arm64: uprobes: Optimize cache flushes for xol slot

2024-09-26 Thread Liao, Chang
rate "fix info leak" patch. Do you mean to fill the entire page with CPU specific illegal instructions in this patch? > > Oleg. > > -- BR Liao, Chang

Re: [PATCH] arm64: uprobes: Optimize cache flushes for xol slot

2024-09-23 Thread Liao, Chang
在 2024/9/23 15:18, Will Deacon 写道: > On Mon, Sep 23, 2024 at 09:57:14AM +0800, Liao, Chang wrote: >> 在 2024/9/20 23:32, Catalin Marinas 写道: >>> On Fri, Sep 20, 2024 at 04:58:31PM +0800, Liao, Chang wrote: >>>> 在 2024/9/19 22:18, Oleg Nesterov 写道: >>>>

Re: [PATCH] arm64: uprobes: Optimize cache flushes for xol slot

2024-09-23 Thread Liao, Chang
在 2024/9/22 22:09, Will Deacon 写道: > On Fri, Sep 20, 2024 at 07:32:23PM +0200, Oleg Nesterov wrote: >> On 09/20, Catalin Marinas wrote: >>> >>> On Fri, Sep 20, 2024 at 04:58:31PM +0800, Liao, Chang wrote: >>>> >>>> >>>> 在 2024

Re: [PATCH] uprobes: Improve the usage of xol slots for better scalability

2024-09-23 Thread Liao, Chang
if-body excution exclusive among the CPUs? If that's the case, I guess the test_and_put_task_slot() is the equvialent to the race condition check. test_and_put_task_slot() uses a compare and exchange operation on the slot_ref of utask instance. Regardless of the work type being performed by other CPU, it will always bail out unless the slot_ref has a value of one, indicating the utask is free to access from local CPU. > > -Andi > > -- BR Liao, Chang

Re: [PATCH] arm64: uprobes: Optimize cache flushes for xol slot

2024-09-22 Thread Liao, Chang
在 2024/9/20 23:32, Catalin Marinas 写道: > On Fri, Sep 20, 2024 at 04:58:31PM +0800, Liao, Chang wrote: >> >> >> 在 2024/9/19 22:18, Oleg Nesterov 写道: >>> On 09/19, Liao Chang wrote: >>>> >>>> --- a/arch/arm64/kernel/probes/uprobes.c >&g

Re: [PATCH] arm64: uprobes: Optimize cache flushes for xol slot

2024-09-20 Thread Liao, Chang
在 2024/9/19 22:18, Oleg Nesterov 写道: > On 09/19, Liao Chang wrote: >> >> --- a/arch/arm64/kernel/probes/uprobes.c >> +++ b/arch/arm64/kernel/probes/uprobes.c >> @@ -17,12 +17,16 @@ void arch_uprobe_copy_ixol(struct page *page, unsigned >> long vaddr, >>

Re: [PATCH] uprobes: Improve the usage of xol slots for better scalability

2024-09-19 Thread Liao, Chang
在 2024/9/18 20:25, Andi Kleen 写道: > Liao Chang writes: >> + >> +/* >> + * xol_recycle_insn_slot - recycle a slot from the garbage collection list. >> + */ >> +static int xol_recycle_insn_slot(struct xol_area *area) >> +{ >> +struct uprobe_ta

[PATCH] arm64: uprobes: Optimize cache flushes for xol slot

2024-09-19 Thread Liao Chang
( 1.093M/s/cpu) Signed-off-by: Liao Chang --- arch/arm64/kernel/probes/uprobes.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/kernel/probes/uprobes.c b/arch/arm64/kernel/probes/uprobes.c index d49aef2657cd..5ee27509d6f6 100644 --- a/arch/arm64/kernel/probes/uprobes.c +++ b

Re: [PATCH v3 0/2] uprobes: Improve scalability by reducing the contention on siglock

2024-09-17 Thread Liao, Chang
Hi, Peter and Masami I look forward to your inputs on these series. Andrii has proven they are hepful for uprobe scalability. Thanks. 在 2024/9/15 23:18, Oleg Nesterov 写道: > Hi Liao, > > On 09/14, Liao, Chang wrote: >> >> Hi, Oleg >> >> Kindly ping. >>

[PATCH] uprobes: Improve the usage of xol slots for better scalability

2024-09-17 Thread Liao Chang
the associated xol slot will be free. Signed-off-by: Liao Chang --- include/linux/uprobes.h | 4 + kernel/events/uprobes.c | 173 ++-- 2 files changed, 135 insertions(+), 42 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.

Re: [PATCH v3 0/2] uprobes: Improve scalability by reducing the contention on siglock

2024-09-13 Thread Liao, Chang
Hi, Oleg Kindly ping. This series have been pending for a month. Is thre any issue I overlook? Thanks. 在 2024/8/15 9:46, Liao Chang 写道: > The profiling result of BPF selftest on ARM64 platform reveals the > significant contention on the current->sighand->siglock is the > scalabi

[PATCH] function_graph: Simplify the initialization of fgraph LRU data

2024-09-12 Thread Liao Chang
This patch uses [first ... last] = value to initialize fgraph_array[]. And it declares all the callbacks in fgraph_stub as static, as they are not called from external code. Signed-off-by: Liao Chang --- include/linux/ftrace.h | 1 - kernel/trace/fgraph.c | 54

Re: [PATCH] arm64: uprobes: Simulate STP for pushing fp/lr into user stack

2024-09-10 Thread Liao, Chang
在 2024/9/11 4:54, Andrii Nakryiko 写道: > On Mon, Sep 9, 2024 at 11:14 PM Liao Chang wrote: >> >> This patch is the second part of a series to improve the selftest bench >> of uprobe/uretprobe [0]. The lack of simulating 'stp fp, lr, [sp, #imm]' >>

Re: [PATCH] arm64: insn: Simulate nop and push instruction for better uprobe performance

2024-09-09 Thread Liao, Chang
Hi, Mark 在 2024/9/6 17:39, Mark Rutland 写道: > On Tue, Aug 27, 2024 at 07:33:55PM +0800, Liao, Chang wrote: >> Hi, Mark >> >> Would you like to discuss this patch further, or do you still believe >> emulating >> STP to push FP/LR into the stack in kernel is not

[PATCH] arm64: uprobes: Simulate STP for pushing fp/lr into user stack

2024-09-09 Thread Liao Chang
cpu to 0.714M/s/cpu). [0] https://lore.kernel.org/all/caef4bzao4eg6hr2hzxypn+7uer4chs0r99zln02ezz5yruv...@mail.gmail.com/ [1] https://lore.kernel.org/all/Zr3RN4zxF5XPgjEB@J2N7QTR9R3/ [2] https://lore.kernel.org/all/20240815014629.2685155-1-liaocha...@huawei.com/ Signed-off-by: Liao Chang --- arch

Re: [PATCH] arm64: insn: Simulate nop and push instruction for better uprobe performance

2024-09-09 Thread Liao, Chang
在 2024/9/6 17:39, Mark Rutland 写道: > On Tue, Aug 27, 2024 at 07:33:55PM +0800, Liao, Chang wrote: >> Hi, Mark >> >> Would you like to discuss this patch further, or do you still believe >> emulating >> STP to push FP/LR into the stack in kernel is not a good

Re: [PATCH] arm64: insn: Simulate nop and push instruction for better uprobe performance

2024-09-09 Thread Liao, Chang
在 2024/9/6 4:17, Andrii Nakryiko 写道: > On Fri, Aug 30, 2024 at 2:25 AM Liao, Chang wrote: >> >> >> >> 在 2024/8/30 3:26, Andrii Nakryiko 写道: >>> On Tue, Aug 27, 2024 at 4:34 AM Liao, Chang wrote: >>>> >>>> Hi, Mark >>>> &g

Re: [PATCH] arm64: insn: Simulate nop and push instruction for better uprobe performance

2024-09-09 Thread Liao, Chang
在 2024/9/7 1:46, Andrii Nakryiko 写道: > On Fri, Sep 6, 2024 at 2:39 AM Mark Rutland wrote: >> >> On Tue, Aug 27, 2024 at 07:33:55PM +0800, Liao, Chang wrote: >>> Hi, Mark >>> >>> Would you like to discuss this patch further, or do you still believe &g

[PATCH v2] arm64: insn: Simulate nop instruction for better uprobe performance

2024-09-09 Thread Liao Chang
addressed in a separate patch. [0] https://lore.kernel.org/all/caef4bzao4eg6hr2hzxypn+7uer4chs0r99zln02ezz5yruv...@mail.gmail.com/ [1] https://lore.kernel.org/all/Zr3RN4zxF5XPgjEB@J2N7QTR9R3/ CC: Andrii Nakryiko CC: Mark Rutland Signed-off-by: Liao Chang --- arch/arm64/include/asm/insn.h

[PATCH v3] arm64: Replace linked list with switch statement for BRK handlers

2024-09-05 Thread Liao Chang
ll ; done' & jobs=$(ps | grep 'do date' | head -n 2 | awk '{print $1}') echo kgdbts=V1F1000 > /sys/module/kgdbts/parameters/kgdbts sleep 10 kill -9 $jobs [0] https://lore.kernel.org/all/zs3lnykxl5sg2...@j2n7qtr9r3.cambridge.arm.com/ Suggested-by: Mark R

Re: [PATCH v2] arm64: Replace linked list with switch statement for breakpoint handlers

2024-09-02 Thread Liao, Chang
在 2024/9/2 16:58, Mark Rutland 写道: > On Sat, Aug 31, 2024 at 06:41:41AM +0000, Liao Chang wrote: >> v2->v1: >> Fix a bug of releasing spinlock in kgdb_arch_exit(). >> >> As suggested by Mark Rutland [0], this patch remove the linked list used >> for breakpoi