On 22.02.20 17:42, Igor Druzhinin wrote:
On 22/02/2020 06:05, Jürgen Groß wrote:
On 22.02.20 03:29, Igor Druzhinin wrote:
On 18/02/2020 12:21, Juergen Gross wrote:
Today the RCU handling in Xen is affecting scheduling in several ways.
It is raising sched softirqs without any real need and it requires
tasklets for rcu_barrier(), which interacts badly with core scheduling.
This small series repairs those issues.
Additionally some ASSERT()s are added for verification of sane rcu
handling. In order to avoid those triggering right away the obvious
violations are fixed.
I've done more testing of this with [1] and, unfortunately, it quite easily
deadlocks while without this series it doesn't.
Steps to repro:
- apply [1]
- take a host with considerable CPU count (~64)
- run a loop: xen-hptool smt-disable; xen-hptool smt-enable
[1] https://lists.xenproject.org/archives/html/xen-devel/2020-02/msg01383.html
Yeah, the reason for that is that rcu_barrier() is a nop in this
situation without my patch, as the then called stop_machine_run() in
rcu_barrier() will just return -EBUSY.
Are you sure that's ther reason? I always have the following stack on CPU0:
(XEN) [ 120.891143] *** Dumping CPU0 host state: ***
(XEN) [ 120.895909] ----[ Xen-4.13.0 x86_64 debug=y Not tainted ]----
(XEN) [ 120.902487] CPU: 0
(XEN) [ 120.905269] RIP: e008:[<ffff82d0802aa750>]
smp_send_call_function_mask+0x40/0x43
(XEN) [ 120.913415] RFLAGS: 0000000000000286 CONTEXT: hypervisor
(XEN) [ 120.919389] rax: 0000000000000000 rbx: ffff82d0805ddb78 rcx:
0000000000000001
(XEN) [ 120.927362] rdx: ffff82d0805cdb00 rsi: ffff82d0805c7cd8 rdi:
0000000000000007
(XEN) [ 120.935341] rbp: ffff8300920bfbc0 rsp: ffff8300920bfbb8 r8:
000000000000003b
(XEN) [ 120.943310] r9: 0444444444444432 r10: 3333333333333333 r11:
0000000000000001
(XEN) [ 120.951282] r12: ffff82d0805ddb78 r13: 0000000000000001 r14:
ffff8300920bfc18
(XEN) [ 120.959251] r15: ffff82d0802af646 cr0: 000000008005003b cr4:
00000000003506e0
(XEN) [ 120.967223] cr3: 00000000920b0000 cr2: ffff88820dffe7f8
(XEN) [ 120.973125] fsb: 0000000000000000 gsb: ffff88821e3c0000 gss:
0000000000000000
(XEN) [ 120.981094] ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs:
e008
(XEN) [ 120.988548] Xen code around <ffff82d0802aa750>
(smp_send_call_function_mask+0x40/0x43):
(XEN) [ 120.997037] 85 f9 ff fb 48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 74 02
0f 0b 55 48 89 e5
(XEN) [ 121.005442] Xen stack trace from rsp=ffff8300920bfbb8:
(XEN) [ 121.011080] ffff8300920bfc18 ffff8300920bfc00 ffff82d080242c84
ffff82d080389845
(XEN) [ 121.019145] ffff8300920bfc18 ffff82d0802af178 0000000000000000
0000001c1d27aff8
(XEN) [ 121.027200] 0000000000000000 ffff8300920bfc80 ffff82d0802af1fa
ffff82d080289adf
(XEN) [ 121.035255] fffffffffffffd55 0000000000000000 0000000000000000
0000000000000000
(XEN) [ 121.043320] 0000000000000000 0000000000000000 0000000000000000
0000000000000000
(XEN) [ 121.051375] 000000000000003b 0000001c25e54bf1 0000000000000000
ffff8300920bfc80
(XEN) [ 121.059443] ffff82d0805c7300 ffff8300920bfcb0 ffff82d080245f4d
ffff82d0802af4a2
(XEN) [ 121.067498] ffff82d0805c7300 ffff83042bb24f60 ffff82d08060f400
ffff8300920bfd00
(XEN) [ 121.075553] ffff82d080246781 ffff82d0805cdb00 ffff8300920bfd80
ffff82d0805c7040
(XEN) [ 121.083621] ffff82d0805cdb00 ffff82d0805cdb00 fffffffffffffff9
ffff8300920bffff
(XEN) [ 121.091674] 0000000000000000 ffff8300920bfd30 ffff82d0802425a5
ffff82d0805c7040
(XEN) [ 121.099739] ffff82d0805cdb00 fffffffffffffff9 ffff8300920bffff
ffff8300920bfd40
(XEN) [ 121.107797] ffff82d0802425e5 ffff8300920bfd80 ffff82d08022bc0f
0000000000000000
(XEN) [ 121.115852] ffff82d08022b600 ffff82d0804b3888 ffff82d0805cdb00
ffff82d0805cdb00
(XEN) [ 121.123917] fffffffffffffff9 ffff8300920bfdb0 ffff82d0802425a5
0000000000000003
(XEN) [ 121.131975] 0000000000000001 00000000ffffffef ffff8300920bffff
ffff8300920bfdc0
(XEN) [ 121.140037] ffff82d0802425e5 ffff8300920bfdd0 ffff82d08022b91b
ffff8300920bfdf0
(XEN) [ 121.148093] ffff82d0802addb1 ffff83042b3b0000 0000000000000003
ffff8300920bfe30
(XEN) [ 121.156150] ffff82d0802ae086 ffff8300920bfe10 ffff83042b7e81e0
ffff83042b3b0000
(XEN) [ 121.164216] 0000000000000000 0000000000000000 0000000000000000
ffff8300920bfe50
(XEN) [ 121.172271] Xen call trace:
(XEN) [ 121.175573] [<ffff82d0802aa750>] R
smp_send_call_function_mask+0x40/0x43
(XEN) [ 121.183024] [<ffff82d080242c84>] F on_selected_cpus+0xa4/0xde
(XEN) [ 121.189520] [<ffff82d0802af1fa>] F
arch/x86/time.c#time_calibration+0x82/0x89
(XEN) [ 121.197403] [<ffff82d080245f4d>] F
common/timer.c#execute_timer+0x49/0x64
(XEN) [ 121.204951] [<ffff82d080246781>] F
common/timer.c#timer_softirq_action+0x116/0x24e
(XEN) [ 121.213271] [<ffff82d0802425a5>] F
common/softirq.c#__do_softirq+0x85/0x90
(XEN) [ 121.220890] [<ffff82d0802425e5>] F
process_pending_softirqs+0x35/0x37
(XEN) [ 121.228086] [<ffff82d08022bc0f>] F
common/rcupdate.c#rcu_process_callbacks+0x1ef/0x20d
(XEN) [ 121.236758] [<ffff82d0802425a5>] F
common/softirq.c#__do_softirq+0x85/0x90
(XEN) [ 121.244378] [<ffff82d0802425e5>] F
process_pending_softirqs+0x35/0x37
(XEN) [ 121.251568] [<ffff82d08022b91b>] F rcu_barrier+0x58/0x6e
(XEN) [ 121.257639] [<ffff82d0802addb1>] F cpu_down_helper+0x11/0x32
(XEN) [ 121.264051] [<ffff82d0802ae086>] F
arch/x86/sysctl.c#smt_up_down_helper+0x1d6/0x1fe
(XEN) [ 121.272454] [<ffff82d08020878d>] F
common/domain.c#continue_hypercall_tasklet_handler+0x54/0xb8
(XEN) [ 121.281900] [<ffff82d0802454e6>] F
common/tasklet.c#do_tasklet_work+0x81/0xb4
(XEN) [ 121.289786] [<ffff82d080245803>] F do_tasklet+0x58/0x85
(XEN) [ 121.295771] [<ffff82d08027a0b4>] F
arch/x86/domain.c#idle_loop+0x87/0xcb
So it's not in get_cpu_maps() loop. It seems to me it's not entering time sync
for some
reason.
Interesting. Looking further into that.
At least time_calibration() is missing to call get_cpu_maps().
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel